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1. Introduction 


Edge detection forms the first stage in a very large number of vision modules, 
and any edge detector should be formulated in the appropriate context. However, 
the requirements of many modules are similar and it seems as though it should be 
possible to design one edge detector that performs well in several contexts. The 
crucial first step in the design of such a detector should be the specification of a 
set of performance criteria that capture these requirements. The specification of 
these criteria and the derivation of optimal operators from them forms the subject 
of this report. 

The operation of the edge detector is best illustrated by the example in figure 
(1.1), which was produced by the detector described in this report. The detector 
accepts discrete digitized images and produces an “edge map” as its output. The 
edge map includes explicit information about the position and strength of edges, 
their orientation, and the “scale” at which the change took place. Although they 
are not made explicit, it is also possible to compute the uncertainty in position or 
strength of an edge from the quantites in the edge map. The example in figure (1.1) 
includes position information only. 

A digitized image contains a great deal of redundancy. There is redundancy in 
the information theoretic sense (it is possible to compress the sampled data into fewer 
bits without changing the reconstructed image significanty). Even after efficient 
encoding, much of the what remains is not useful to later vision modules. These 
modules typically require structural information, i.e. details of surface orientation 
and the material of which the visible surfaces comprise. Where the surfaces are 
smooth and of uniform reflectance, shape from shading (Horn 1975) may be applied 
to obtain surface orientation. In many other modules such as shape from motion 
(Ullman 1979 and Hildreth 1983), shape from contour (Stevens 1980), shape from 
texture (Witkin 1980), and Stereo (Marr and Poggio 1979, Grimson 1981) structural 
properties of underlying surfaces are inferred from edge contours. In particular, 
step changes in intensity are important because they typically correspond to sharp 
changes in orientation or material, or to object boundaries. Edge detection is a 
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Figure 1.1. Positional information provided by the edge detector applied to an 
image of some mechanical parts 









means of generating compact descriptions which preserve most of the structural 
information in an image. 

Some previous formulations have chosen the first or second derivative as the 
appropriate quantity to characterize step edges, and have formed optimal estimates 
of this derivative over some support. Examples of first derivative operators are the 
operators of Roberts (1965) and Macleod (1970), while Modestino and Fries (1977) 
formed an optimal estimate of the two-dimensional Laplacian over a large support. 
Marr and Hildreth (1980) suggested the Laplacian of a broad Gaussian since it 
optimizes the trade-off in localization and bandwidth. There are problems with the 
Laplacian however, and the whole concept of derivative estimation seems to have 
poor foundation. These criticisms will be made specific in chapter 7. 

There is a second major class of formulations in which the image surface is 
approximated by a set of basis functions and the edge parameters are estimated 
from the modelled image surface. Examples of this technique include the work of 
Prewitt (1970), Hueckel (1971) and Haralick (1982). These methods allow more 
direct estimates of edge properties such as position and orientation, but since the 
basis functions are usually not complete, the properties apply only to a projection 
of the actual image surface on to the subspace spanned by the basis functions. 
However, the basis functions are a major factor in operator performance, especially 
its ability to localize edges. 

In this report we begin with a traditional model of a step edge in white Gaussian 
noise and try to formulate precisely the criteria for effective edge detection. We 
assume that detection is performed by convolving the noisy edge with a spatial 
function f(x) (which we are trying to find) and by marking edges at the maxima in 
the output of this convolution. We then specify three performance criteria on the 
output of this operator. 

(i) Good detection. There should be a low probablity of failing to mark real edge 
points, and low probability of falsely marking non-edge points. Since both 
these probabilities are monotonically decreasing functions of the output signal 
to noise ratio, this criterion corresponds to maximizing signal to noise ratio. 
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(ii) Good localization. The points marked as edges by the operator should be as 

close as possible to the centre of the true edge. 

(iii) Only one response to a single edge. This is implicitly captured in (i) since 
when two nearby operators respond to the same edge, one of them must be 
considered a false edge. However, the mathematical form of the first criterion 
did not capture the multiple response requirement and it had to be made 
explicit. 

The first result of the analysis for step edges is that (i) and (ii) are conflicting 
and that there is a trade-off or uncertainty principle between them. Broad operators 
have good signal to noise ratio but poor localization and vice-versa. A simple 
choice of the mathematical form for the localization criterion gives a product of 
a localization term and signal to noise ratio that is constant. Spatial scaling of 
the function f(x) will change the individual values of signal to noise ratio and 
localization but not their product. Given the analytic form of a detection function, 
we can theoretically obtain arbitrarily good signal to noise ratio or localization from 
it by scaling, but not simultaneously. From the analysis we can conclude that there 
is a single best shape for the function / which maximizes the product and that if we 
scale it to achieve some value of one of the criteria, it will simultaneously provide 
the maximum value for the other. To handle a wide variety of images, an edge 
detector needs to use several different widths of operator, and to combine them in 
a coherent way. By forming the criteria for edge detection as a set of functionals 
of the unknown operator /, we can use variational techniques to find the function 
that maximizes the criteria. 

The second result is that the criteria (i) and (ii) by themselves are inadequate 
to produce a useful edge detector. It seems that we can obtain maximal signal to 
noise ratio and arbitrarily good localization by using a difference of boxes operator. 
The difference of boxes (see figure 2.2) was suggested by Rosenfeld and Thurston 
(1971) and was used by Herskovits and Binford (1970). If we look closely at the 
response of such an operator to a noisy step edge we find that there is an output 
maximum close to the centre of the edge, but that there may be many others 
nearby. We have not achieved good localization because there is no way of telling 
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which of the maxima is closest to the true edge. The addition of criterion (iii) 
gives an operator that has very low probability of giving more than one maximum 
in response to a single edge, and it also leads to a finite limit for the product of 
localization and signal to noise ratio. 

The third result is an analytic form for the operator. It is the sum of four 
complex exponentials and can be approximated by the first derivative of a Gaussian. 
A numerical finite dimensional approximation to this function was first found using a 
stochastic hill-climbing technique. This was done because it was much easier to write 
the multiple response criterion in deterministic form for a numerical optimization 
than as a functional of /. Specifically, the numerical optimizer provides candidate 
outputs for evaluation, and it is a simple matter to count the number of maxima 
in one of the outputs. To express this constraint analytically we need to find the 
expectation value of the number of maxima in the response to an edge, and to 
express this as a functional on /, which is much more difficult. The first derivative 
of a Gaussian has been suggested before (Macleod 1970). It is also worth noting 
that in one dimension the maxima in the output of this first derivative operator 
correspond to zero-crossings in the output of a second derivative operator. 

Several further results relate to the extension of the operator to two (or more) 
dimensions. They can be summarized roughly by saying that the detector should 
be directional, and if the image permits, the more directional the better. The issue 
of non-directional (Laplacian) versus directional edge operators has been the topic 
of debate for some time, compare for example Marr (1976) with Marr and Hildreth 
(1980). To summarize the argument presented here, a directional operator can be 
shown to have better localization than the Laplacian, signal to noise ratio is better, 
the computational effort required to compute the directional components is slight 
if efficient algorithms are used, and finally the problem of combining operators 
of several orientations is difficult but not intractable. It is, for example, much 
more difficult to combine the outputs of operators of different sizes, since their 
supports differ markedly. For a given operator width, both signal to noise ratio and 
localization improve as the length of the operator (parallel to the edge) increases, 
provided of course that the edge does not deviate from a straight line. When 
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the image does contain long approximately straight contours, highly directional 
operators are the best choice. This means several operators will be necessary to 
cover all possible edge orientations, and also that less directional operators will also 
be needed to deal with edges that are locally not straight. 

The problem of combining the different operator widths and orientations is 
approached in an analogous manner to the operator derivation. We begin with 
the same set of criteria and try to choose the operator that gives good signal to 
noise ratio and best localization. We set a minimum acceptable error rate and 
then choose the smallest operator with greater signal to noise than the threshold 
determined by the error rate. In this way the global error rate is fixed while the 
localization of a particular edge will depend on the local image signal to noise ratio. 
The problem of choosing the best operator from a set of directional operators is 
simpler, since only one or two will respond to an edge of a particular orientation. 
The problem of choosing between a long directional operator and a less directional 
one is theoretically simple but difficult in practice. Highly directional operators are 
clearly preferable, but they cannot be used for locally curved edges. It is necessary 
to associate a goodness of fit measure with each operator that indicates how well 
the image fits the model of a linearly extended step. When the edge is good enough 
the directional operator output is used and the output of less directional neighbours 
is suppressed. 

While the detection of step edges is the primary goal of the report, chapter 4 
gives a general form for the optimality criteria. Using this general form, it is possible 
to design optimal operators for arbitrary features. A numerical optimization is 
used to find the impulse response of the operator given an input waveform to be 
detected. The technique is illustrated by the derivation of operators for ridge, roof 
and step edges. Of these the ridge and step detectors have been tested on real 
images. The particular problems of extending the one-dimensional ridge operator 
to two dimensions, and the problem of integrating the step and ridge detector 
outputs are discussed. 

Following the analysis we outline some simple experiments which seem to 
indicate that the human visual system is performing similar selections (at some 
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computational level), or at least that the computation that it does perform has 
a similar set of goals. We find that adding noise to an image has the effect of 
producing a blurring of the image detail, which is consistent with there being several 
operator sizes. More interestingly, the addition of noise may enable perception of 
changes at a large scale which, even though they were present in the original image, 
were difficult to perceive because of the presence of sharp edges. Our ability to 
perceive small fluctuations in edges that are approximately straight is also reduced 
by the addition of noise, but the impression of a straight edge is not. 

As a guide to the reader, chapters 2 and 3 form the core of the analysis for 
step edges. They also contain most of the signal theory, and the general reader 
may wish to skim over them. The first section of chapter 3 should be read however, 
as it includes the translation of the theoretical operator into a practical algorithm. 
Chapter 4 is easier going and contains a more general form for the optimality 
criteria. It gives examples of the solution of the variational problem for roof and 
ridge edges. Chapter 5 is titled “details of implementation” and it may be tempting 
to avoid it as being too low-level. However it contains several efficient algorithms 
for Gaussian convolution, and may have applications outside the scope of the 
present work. Finally, chapters 6 and 7 give weight to the analysis by showing the 
performance of the operator on real images and by comparing it both experimentally 
and theoretically with some other edge detectors. 


11 



2. One-Dimensional Formulation for Step Edges 

The basic design problem is illustrated in figure (2.1). We are trying to detect 
a step edge which is bathed in Gaussian noise, figure (2.1a). We convolve with some 
spatial function (2.1b) and mark edges at maxima in the result of this convolution 
(2.1c). The objective is to find the spatial function (call it /) which gives the “best” 
output, where best is defined by a precise set of criteria on step edge detection. 

Some preliminaries on notation ; when we speak of an edge detection “operator” 
we mean a mapping from a one or two dimensional intensity function (the image 
or a linear slice through it) to an intensity function of the same dimension. If the 
operator is linear and shiR invariant, then it can be represented by a convolution of 
the intensity function with the “impulse response” (one dimension) or “point-spread 
function” (two dimensions) of the operator, which is the result of applying the 
operator to a unit impulse at the origin. ShiR invariance is clearly a desirable 
property of an edge operator. To begin with we will consider only linear shiR 
invariant operators and later we will apply decision procedures to their outputs, 
which will lead to shiR invariant non-linear operators. The operator that describes 
the mapping from an image to the final representation of edge contours is called 

the “edge detector”. 

The key to the design of an effective edge operator is the accurate evaluation of 
its performance. If we can write down the evaluation function in closed mathematical 
form, we can apply standard tools such as the calculus of variations to find the 
operator that maximizes it. As with many optimization problems, the key to 
obtaining a useful answer is to ask the right questions. The edge detection problem 
is no exception, as should become apparent in the course of the derivation. Several 
passes at the evaluation function had to be made before one was found that closed 
all the “loopholes” and excluded operators that were impractical for “obvious” 
reasons. This is not to say that the problem became one of finding a question to 
fit a proposed solution, but rather that the question was always the same, it was 
just very difficult to express in a closed form that was simple enough to yield 
a variational problem that could be solved. By way of contrast, it was relatively 
easy to obtain a similar solution using a Monte Carlo optimization, because the 
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Figure 2.1. (a) The step edge model, (b) The detection function to be derived, 
(c) The result of tne convolution of this function with the edge. 

evaluation could be done directly on the output of a candidate operator. The real 
problem then, was the translation of the intuitive performance goals to functionals 
that depended directly on the form of the operator. This section describes the main 
stages in the translation process. 



2.1. An Uncertainty Principle 

We consider first the one dimensional edge detection problem. The goal is to 
detect and mark step changes in a signal that contains additive white noise. We 
assume that the signal is flat on both sides of the discontinuity, and that there are 
no other edges close enough to affect the output of the operator (see figure 2.1). 
We need to somehow combine the two goals of accurate detection and localization 
into a single evaluation functional. The detection criterion is simple to express 
in terms of the signal to noise ratio in the operator output, i.e. the ratio of the 
output in response to the step input to the output in response to the noise only. 
The localization criterion is more difficult, but a reasonable choice is the inverse of 
the distance between the true edge and the edge marked by the detector. For the 
distance measure we will use the standard deviation in the position of the maximum 
of the operator output. By using local maxima we are making what seems to be 
an arbitrary choice in the mapping from linear operator output to detector output. 
But the mapping must involve some local predicate, and since we are designing a 
linear operator that will respond strongly to step edges, the maxima in its response 
are a logical choice. 

Let the amplitude of the step be A, and let the noise be 71(2). Then the input 
signal I[x) can be represented as 


I(x) = Au—i(x) -{- n[x) 


( 2 . 1 ) 


where u_i(z) is the unit step function defined as 


u-i(x) 


C 


0, for x < 0 

for x > 0 


Let the impulse response of the operator we are seeking be represented by 
the function /(z). Then the output O(z 0 ) of the application of the operator to the 
input /(z) is given by the convolution integral, 


0(z°) = J(z)/(z 0 — z) 


dx 


( 2 . 2 ) 
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We can use the linearity of convolution to split this integral into contributions due 
to the step and to noise only. The output due to the step only is (at the centre of 
the step, i.e. at xq = 0) 


f f(x)Au— 1 (— x)dx = A f f(x) 

J —oo J —oo 


dx 


(2.3) 


While the mean squared response to the noise component only will be 


E 


[+oo 
J /(x)n (— x ) 


where E[y] is the expectation value of y. If the noise is white the above 
simplifies to 



/ 2 (x)n 2 (— x) dx 



dx 


where n§ = E[n 2 (x)] for all x, i.e. n\ is the variance of the input noise. We define 
the output signal-to-noise ratio as the quotient of the response to the step only and 
the square root of the mean squared noise response. 


S.N.R. = 

f 2 (x) dx 

From this expression we can define a measure £ of the signal to noise 
performance of the operator which is independent of the input signal 

S.N.R. = — S and E = J ~ 00 ^ — = (2.5) 

n ° V/+“ / 2 (^) dx 

This then is the first part of our dual criterion, and finding the impulse response 
/ which maximizes it corresponds to finding the best operator for detection only. 

For the localization criterion we proceed as follows. Recall that we chose to 
mark edges at maxima in the output of the operator. For an ideal step we would 
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expect a single maximum at the centre of the edge. Since the signal I(x) contains 
noise we would expect this maximum to be displaced from the true position of the 
edge (at the origin in this case). To obtain a performance measure which improves 
as the localizing ability of the operator improves, we use the reciprocal of the 
standard deviation of the distance of the actual maximum from the centre of the 
true edge. This is not an arbitrary choice, as it gives a composite performance 
criterion which is scale independent, as we shall see. 

A maximum in the output 0(x o) of the operator corresponds to a zero-crossing 
in the spatial derivative of this output. We wish to find the position xq where 

0'(x o) = f f{x)I(x 0 — x)dx — 0 
ax o J — 00 

Which by the differentiation theorem for convolution can be simplified to 

r+00 

/ / / (i)/(a;o — x)dx = 0 

To find zo we again split the derivative of the output 0'(z 0 ) into components 
due to the step and due to noise only (call these 0\ and O f n respectively). 



The response of the derivative filter to the noise only (at any output point) 
will be a Gaussian random variable with mean zero and variance equal to the 
mean-squared output amplitude 

E[0'l{x)\ = nl /_ + ~ f{x) dx (2.7) 

We now add the constraint that the function / should be antisymmetric. 
An arbitrary function can always be split into symmetric and antisymmetric 
components, but it should be clear that the symmetric component adds nothing to 
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the detection or localizing ability of the operator but will contribute to the noise 
components that affect both. The Taylor expansion of O' s (x 0 ) about the origin gives 

O' s (x 0 ) = Affao) ~ zoA/^O) ( 2 - 8 ) 

For a zero-crossing in the output O 1 we require 


O'(x 0 ) = 0' s (x o) + O'Jx o) = 0 (2.9) 

i.e. O' s (x 0 ) = -O'Jxo) and E[O’?{x 0 )} = £[O' n 2 (x 0 )]. Substituting for the two 
outputs from (2.7) and (2.8) we obtain 



J+” nlf\x) dx 
A 2 f t2 ( 0) 



( 2 . 10 ) 


where 6x 0 is an approximation to the standard deviation of the distance of the 
actual maximum from the true edge. The localization is defined as the reciprocal 

of Sx o 


Localization ==■ — 


|/'( 0 )| 


"° >//+£ /'*(*) dx 


Again we define a performance measure A which is a property of the operator only 


Localization — —A 

n 0 


|/'( 0 )| 


>//+» /' 2 (^) dx 


( 2 . 11 ) 


Having obtained both our desired criteria, we now have the problem of 
combining them in a meaningful way. It turns out that if we use the product of the 
two criteria we obtain a measure which is both amplitude and scale independent. 
This measure is a property of the shape of the impulse response / only, and will be 
the same for all functions f w obtained from f by spatial scaling. In fact the choice 
of the combination will not affect the form of the solution since the variational 
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equations depend only on the individual terms in the criteria. The product of the 
two criteria is 


£(/)A (/) = -M- (2.12) 

yfl±Z /’(*) dx y/jlZ f' 2 (x) dx 

To illustrate the invariance of this criterion under changes of scale, we consider 
the performance of an operator whose impulse response is f w where f w (x) = /(|). 
The performance of the scaled operator is 


am Mfw) = 


f—~ 00 f( X ) AX 

i l/'(0)| 

\fj-n / 2 (z) dx 

y/w yjf ,2 (x) dx 


(2.13) 


where the bracketed terms correspond in order to the detection and localization 
criteria. We see from this form that the signal to noise performance of the operator 
varies as y/w, while the localization varies as the reciprocal of y/w. An operator with 
a broad impulse response will have good signal to noise ratio but poor localization 
and vice versa. With this form of the composite criterion though, the product of 
detection and localization terms is the same for all f w . 

This result suggests that there is a class of operators that have optimal 
performance and that they are related by spatial scaling. In fact this result is 
independent of the choice of combination of the criteria. To see this we assume 
that there is a function / which gives the best localization A for a particular E. 
That is, we find / such that 


E(/) — ci and A(/) is maximized 


(2.14) 


Now suppose we seek a second function f w which gives the best possible 
localization while its signal to noise ratio is fixed to a different value, i.e. 


£(/u/) — C 2 while A(/ u; ) is maximized 


(2.15) 
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If we define f w as before, f w (x) = /(J), and further if we set 


w = c\/c\ 

Then the constraint on f w in (2.15) translates to constraint on / which is 
identical with (2.14). So to solve (2.15) we find / such that 

£(/) = ci and A(/) is maximized 

y/w 

Which has the same solution as (2.14). So if we find a single such function /, 
we can obtain maximal localization for any fixed signal to noise ratio by scaling 
/. Thus our choice of the composite criterion was not arbitrary but highlighted a 
natural constraint or “uncertainty principle” for detection of step edges in noise. 
We can obtain arbitrarily good localization or detection by scaling but not both 
simultaneously. 

We will find (eventually) that the above analysis is valid but that the criterion 
as given is still underspecified. While it does lead to a plausible class of solutions, 
performance will be poor because we have so far ignored an important aspect of 
the detection process. Namely the detector should not produce multiple outputs in 
response to a single edge. In the next section we find the solutions to the above 
optimization problem, and highlight their weakness with regard to multiple edge 
responses. 

2.2. The Optimal Operator for Steps 

The optimal edge detection operator has now been defined implicitly by 
equation (2.12). All that remains is to find a function which maximizes this large 
expression. We must make some simplifications before a solution can be found using 
the calculus of variations. We cannot directly find a function which maximizes the 
quotient of integrals in equation (2.12) since each depends on f(x). Instead we set 
all but one of the integrals to undetermined constant values in an analogous manner 
to the method of Lagrange multipliers. We then find the extreme value of the 
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remaining integral (since it will correspond to the maximum in the total expression) 
as a function of the undetermined constants. The values of the constants are then 
chosen so as to maximize the value of the remainder of the expression, which is now 
a function only of the three constants. Given these constants, we can completely 
uniquely specify the function f[x) which gives the global maximum of the criterion. 

The second simplification involves the limits of the integrals. The two integrals 
in the denominator of (2.12) have limits at plus and minus infinity, while the integral 
in the numerator has one limit at zero and the other at minus infinity. Since the 
function / should be antisymmetric, we can use the latter limit for all integrals. 
The denominator integrals will have half the value over this subrange that they 
would have had over the full range. Also, this enables the value of /'(O) to be set as 
a boundary condition, rather than expressed as an integral of f". The lower limit 
of all the integrals at minus infinity should be set to some finite negative value, say 
—W since we will be dealing with an operator of finite extent. These simplifications 
allow us to exploit the isoperimetric constraint condition (see Courant and Hilbert 
1953). This allows us to combine a set of constraint integrals that share the same 
limits as the integral being extremized into a single variational equation. 

So the problem of finding the maximum of equation (2.12) reduces to that 
of finding the minimum of the integral in the denominator of the S.N.R. term, 
subject to the constraint that the other integrals remain constant. By the principle 
of reciprocity, we could have chosen to extremize any of the integrals while keeping 
the others constant, but the solution should be the same. We seek some function / 
chosen from a space of admissible functions that minimizes the integral 

/*(*) dx (2.16) 



subject to 
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J_ w f'\ x ) dx = c i 


m = c 3 (2.17) 

The space of admissible functions in this case will be the space of all continuous 
functions that satisfy certain boundary conditions, namely that /(0) = 0 and 
/(— \Y) = 0. These boundary conditions are necessary to ensure that the integrals 
evaluated over finite limits accurately represent the infinite convolution integrals. 
That is, if the nth derivative of / appears in some integral, the function must be 
continuous in its (n-l)st derivative over the range (—oo, +oo). This implies that the 
values of / and its first (n-1) derivatives must be zero at the limits of integration, 
since they must be zero outside this range. 

The functional to be minimized is of the form J* F(x, f, f) and we have a 
series of constraints that can be written in the form Gi(x, /, f) — . Since 

the constraints are isoperimetric, i.e. they share the same limits of integration as 
the integral being minimized, we can form form a composite functional ty{x,f,f) 
as a linear combination of the functionals that appear in the expression to be 
minimized and in the constraints (Courant and Hilbert 1953). Finding a solution for 
this unconstrained problem is equivalent to finding the solution to the constrained 
problem. The composite functional is 


*(*,/,/') = + XiGi(*,/,/0 + X 2 G 2 (x,/,/') + ... 

Substituting, 


*(*, /. /') = f + Xi /' 2 + X 2 / (2.18) 

It may be seen from the form of this equation that the choice of which integral 
is extremized and which are constraints is arbitrary, the solution will be the same. 
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This is an example of what is known as reciprocity in variational problems. The 
choice of an integral from the denominator is simply convenient since the standard 
form of the Euler equations applies to minimization problems. The Euler equation 
that corresponds to this functional is 


\P fl — = 0 

ax 


Where f denotes the partial derivative of with respect to /. This gives 


2f(x) - 2Xi/"(z) + X 2 = 0 


(2.19) 


The general solution of this differential equation is 


f(x) = 


i ax i —ax 

y-faie -f a 2 e 


( 2 . 20 ) 


Where a = X 1 ^ and the constants a\ and 02 are determined by the boundary 
conditions /(0) = 0 and /(— W) = 0. When these constraints are added the 
function / can be written in the form 


«■> - - -s 11 ) 


cosh a 


( 2 . 21 ) 


From this we can obtain expressions for the signal-to-noise ratio and localization as 
a function of the parameters \i and X 2 . To simplify the expressions we will assume 
a width W of 2 and make use of the scaling properties from equation (2.13). This 


gives 


£ = 


2a cosh a — 2 sinh a 
^2a 2 cosh 2a — 3a sinh 2a -f- 4a 2 


( 2 . 22 ) 


A = 


a sinh a 

'a sinh 2a — 2a 2 


(2.23) 
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Both these expressions are functions only of a, and we can investigate the behaviour 
of / as a tends to its limiting values 0 and -{-oo. As a tends to zero we find that 
function / tends to a parabola whose equation is 

/(*) = -X 2 a 2 (l - i 2 ) (2.24) 

The corresponding values of signal-to-noise ratio and localization are 



When the value of a approaches infinity, we find that the function approaches 
a constant over the range (-2,0) (recalling that W = 2), and that the signal-to-noise 
ratio tends to 1. This is a very small increase over the corresponding value as a 
tended to zero. However, the localization term, increases without bound. Prom 
this result it would seem that a difference of boxes function (the antisymmetric 
extension of the derived function over the range [-2,2]) gives the best possible 
signal-to-noise ratio with arbitrarily good localization. This function is in fact the 
optimal Wiener filter for the step edge. 

This operator has been used quite extensively because of its simplicity and 
because it is easy to compute, as in the work of Rosenfeld and Thurston (1971), and 
in conjunction with lateral inhibition in Herskovits and Binford (1970). However it 
has a very high bandwidth and tends to exhibit many maxima in its response to 
noisy step edges, which is a serious problem when the imaging system adds noise 
or when the image itself contains textured regions. These extra edges should be 
considered erroneous according to the first of our criteria. However, the analytic 
form of this criterion was derived from the response at a single point (the centre 
of the edge) and did not consider the interaction of the responses at several nearby 
points. We need to make this explicit by adding a further constraint to the solution. 

2.3. Eliminating Multiple Responses 

If we examine the output of a difference of boxes edge detector we find that the 
response to a noisy step is a roughly triangular peak with numerous sharp maxima 
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in the vicinity of the edge (see figure 2.2). These maxima are so close together that 
it is not possible to select one as the response to the step while identifying the 
others as noise. We need to add to our criteria the requirement that the function 
/ will not have “too many” responses to a single step edge in the vicinity of the 
step. We need to limit the number of peaks in the response so that there will be 
a low probability of declaring more than one edge. Ideally, we would like to make 
the distance between peaks in the noise response approximate the width of the 

response of the operator to a single step. This width will be about the same as the 
operator width W. 

In order to express this as a functional constraint on /, we need to obtain 
an expression for the distance between adjacent noise peaks. We first note that 
the mean distance between adjacent maxima in the output is twice the distance 
between adjacent zero-crossings in the derivative of the operator output. Then we 
make use of a result due to Rice (1944, 1945) that the average distance between 
zero-crossings of the response of a function g to Gaussian noise is 


x ave 



(2.26) 


Where R(t) is the autocorrelation function of g. In our case we are looking for the 
mean zero-crossing spacing for the function /'. Now since 



and 



The mean distance between zero-crossings of f f will be 


x zc 


j s+zn*)d* \i 

\j±£f" 2 (x)dx) 


(2.27) 


The distance between adjacent maxima in the noise response of /, denoted 

X max, will be twice x 2C . We set this distance to be some fraction k of the operator 
width. 
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Figure 2.2. Responses of difference of boxes and first derivative of Gaussian 
operators to a noisy step edge 


iffioi — 2ijic — kW 

This new constraint adds only one term to the composite functional 'P since 
the integral of f n already appears in from the localization criterion. While in 
the original functional this integral appeared in the denominator of a quantity to 
be maximized, (i.e. the localization criterion) it now appears in the numerator of 
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the mean distance between maxima, which is a constraint on the solution. It is now 
no longer clear what the sign of its Lagrange multiplier should be. This leads to 
several possible solutions for / as we shall see. The new functional is given by 


*(*. /• /") = f + X ,/ 2 + X 2 /" 2 + X 3 / (2.28) 


The Euler equation corresponding to a functional of second order is 



dx 2 



When the above is substituted into the Euler equation we get 


2/(*) ~ 2 X 1 /"(z) + 2 \ 2 f""(x) + X 3 = 0 


(2.29) 


The solution of this differential equation is the sum of a constant and a set 
of four exponentials of the form e 7X where 7 derives from the solution of the 
corresponding homogeneous differential equation. Now 


2 2X17 2 2X27 4 == 0 



This equation may have roots that are purely imaginary, purely real or complex 
depending on the values of Xi and X 2 . FYom the composite functional ^ we can 
infer that X 2 is positive (since /" 2 is to be minimized) but it is not clear what 
the sign or magnitude of Xj should be. The Euler equation supplies a necessary 
condition for the existence of a minimum, but it is not a sufficient condition. By 
formulating such a condition we can resolve the ambiguity in the value of X x . To 
do this we must consider the second variation of the functional. Let 
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Then by Taylor’s theorem, 


An = r Hx.fj'.ndx 

JX 0 


J[f + C9] = J\f] + th[f,g\+^t i h[f + P9,9\ 

where p is some number between 0 and e, and g is chosen from the space of 
admissible functions, and where 


Jl[f ,9] = / ’*'/? + *f‘9' + y>I"9" d* 

J Xq 


Hf>9 ] = / 'Vffg 2 + *f'f‘9 n + *f"f"9" 2 

J Xq 


+24'//<gs' + 2 y>ri«g'g n + I'il jf»gg" dx 


(2.31) 


Note that J\ is nothing more than the integral of g times the Euler equation 
for / (transformed using integration by parts) and will be zero if / satisfies the 
Euler equation. We can now define the second variation 6 2 J as 

The necessary condition for a minimum is S 2 J > 0. We can substitute for the 
second partial derivatives of ^ from (2.29) and we get 


/ 9 2 + ^l9i + ^29 ij dx> 0 

JXq 


(2.32) 


which we transform using integration by parts to 


r 

Jx 0 


9 2 — Xi gg xx + X dx > 0 


which can be written as 
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L ( s - + ( Xs - 7 )^ dx > 0 

The integral is guaranteed to be positive if the expression being integrated is 
positive for all x , so if 


then the integral will be positive for all x and for arbitrary g, and the extremum 
will certainly be minimum. If we refer back to (2.28) we find that this condition 
is precisely that which gives complex roots for 7 , so we have both guaranteed 
the existence of a minimum and resolved a possible ambiguity in the form of the 
solution. We can now proceed with the derivation and assume four complex roots 
of the form 7 = 4i iw With a, u real, such that 

a 2 — u 2 = and 4a 2 u 2 

2a2 

The general solution may now be written 


__ X^ — 4X2 


4Xi 


(2.33) 


f(x) — flie ai sino;a: + fl 2 e a:r coswa:-f 03 c ax sin ojx -f 04 c ai coso ;2 -f- c ( 2 . 34 ) 
This function is subject to the boundary conditions 

m = 0 f[~w) =0 no) = s f(-w) =0 

Where s is an unknown constant equal to the slope of the function / at 
the origin. These four boundary conditions enable us to solve for the quantities 
a i through a 4 in terms of the unknown constants a, u, c and s. The boundary 
conditions may be rewritten 


02 + 04 + c = 0 
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a\e a sin a; a 2 e a cosw + 03c a sin w + 04c a cosw + c = 0 


fljW -j- CL20t ~j“ — OL\OL — S 


aie a (a sin a; -j- wcosw) -f- a 2 e a (acosw — w sinw) 

-fa3e —a (—a sin w -j- w cos w) -j- a4e —a (—a cos a; — w sin w) — 0 


(2.35) 


These equations are linear in the four unknowns oi, 02 , <*3> <H and when solved 
they yield 


di — c^ct[(J — Q:) sin 2w — aw cos 2w -}- (—2w 2 sinh a -f- 2a 2 e a ) sin w 
-f-2aw sinh a cos w -j- we” 2a (a + a) — auj/ 4^w 2 sinh 2 a — a 2 sin 2 w j 


a 2 = c^a (<7 — a) cos 2w -j- aw sin 2w — 2aw cosh a sin w — 2w 2 sinh a cos w 
-}-2w 2 e —a sinh a -j- a(a — <t)^/4^w 2 sinh 2 a — a 2 sin 2 w^ 


a 3 = — a(a + a) sin 2w -j- <*w cos 2w -j- (2w 2 sinh a + 2a 2 e a ) sin w 

-f-2aw sinh a cos w -f- we 2a (cr — a) — <rw^/4^w 2 sinh 2 a — a 2 sin 2 w^ 


a 4 _ c^—a(a + a) cos 2w — aw sin 2w + 2aw cosh a sin w + 2w 2 sinh a cos w 
—2w 2 e a sinh a + a(a + <j)^/4^w 2 sinh 2 a — a 2 sin 2 w^ 


(2.36) 


where a is the slope s at the origin divided by the constant c. On inspection of 
these expressions we can see that 03 can be obtained from a\ by replacing a by 
—a, and similarly for 04 from o 2 . 
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The function f is now parametrized in terms of the constants a, u, a and c. 
We have still to find the values of these parameters which maximize the quotient 
of integrals that forms our composite criterion. To do this we first express each 
of the integrals in terms of the constants. Since these integrals are very long 
and uninteresting, they are not given here but for the sake of completeness they 
are included in Appendix I. We have reduced the problem of optimizing over 
an infinite-dimensional space of functions to a non-linear optimization in three 
variables a, u and a (as expected, the combined criterion does not depend on c). 
Unfortunately the resulting criterion, which must still satisfy the multiple response 
constraint, is probably too complex to be solved analytically, and numerical methods 
must be used to provide the final solution. 

In fact there is really no best function / for a given W because the shape of / 
will depend on the multiple response constraint, i.e. it will depend on how far apart 
we force the adjacent responses. Figure (2.3) shows the operators that result from 
particular choices of this distance. Recall that there was no single best function for 
arbitrary w, but a class of functions which were obtained by scaling a prototype 
function by w. We will want to force the responses further apart as the signal to 
noise ratio in the image is lowered, and it is not clear what the value of signal 
to noise ratio will be for a single operator. However, this design is based on the 
use of multiple widths of operator and on a decision procedure which selects the 
smallest operator that has an output signal to noise ratio above a given threshold. 
This means that all operators will spend most of their time operating close to 
their output £ thresholds. We should therefore try to choose a spacing which gives 
acceptable multiple response behaviour under these conditions. 

A rough estimate for the probability of a spurious maximum in the 
neighbourhood of the true maximum can be formed as follows. Recall that maxima 
m an operator output correspond to zero-crossings in the derivative of this output. 

If we look at the first derivative of the response to an ideal step we find that 
it is approximately linear near the centre of the step. There will be only one 
zero-crossing if the slope of this response is greater than the slope of the response 
to noise only. This latter slope is just the second derivative of the response to noise 
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only, and is a Gaussian random variable with standard deviation 



while the slope of the zero-crossing at the centre of the edge is Af( 0). The 
probability p m that the former slope exceeds the latter is given in terms of the 
normal distribution function <& 



We can choose a value for this probability as an acceptable error rate and this 
will determine the ratio of /(0) to a s . Rearranging we obtain. 


-- = $ ‘(I-Pm) 

«o V-f— oo /" (*) dx 

And we can see the explicit dependence of this constraint on the image signal 
to noise ratio. We can eliminate this dependence by relating the probability of a 
multiple response p m to the probability of falsely marking an edge p/ where we 
define 




Pf = l — *(£) 


and we have finally that 


!/'( 0 )l = k J-oo /(«) dx 

\/j±“ f"\x) dx / 2 M dx 


(2.38) 


where A: is a constant determined by the values of the two probabilities. If we choose 
to set p m equal to pf then the value of k is one. Unfortunately, the largest value of 
k that could be obtained using the constrained numerical optimization was about 
.58. This corresponds to an inter-maximum spacing of 1.2 (in units of W). This is 
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the final form of linear operator that we will use. It is illustrated in the last of the 

series of graphs in figure (2.3). Its performance is given by the product of £ and A 
and it has the value 


£A — 1.12 (2.39) 

Inspection of the shape of this operator in figure (2.3) suggests that it may be 
possible to approximate it using a first derivative of a Gaussian G ' where 

G(x) = exp (-^) 

The reason for doing this is that there are very efficient ways to compute the 
two dimensional extension of the filter if it can be represented as some derivative 
of a Gaussian. This will be discussed in detail in chapter 5. We now compare the 
performance of a first derivative of a Gaussian filter with the optimal operator. The 
impulse response of the filter is now given by 



and the terms in the performance criteria have the values 


(2.40) 
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Figure 2.3. Optimal step edge operators for various values of k, from top 1 
Mttom they are k = 0.075, 0.15, 0.25, 0.5, 0.65, 0.7 


ili. 


































(2.41) 



The overall performance index for this operator is 



While the k value for this filter is, from (2.38) 


(2.40) 



The performance of this operator is worse than the optimal operator by about 
20%, and its multiple response measure k, is worse by about 10%. It would probably 
be difficult to detect a difference of this magnitude by looking at the performance 
of the two operators on real images, and because the first derivative of Gaussian 
operator can be computed with much less effort in two dimensions (but see section 

5.2), it has been used exclusively in experiments. The impulse responses of the two 
operators can be compared in figure (2.4). 


2.4. Finding an Operator by Stochastic Optimisation 

The previous section contained the derivation of a "closed form” for an optimal 
edge detector for step edges. Even in the derivation of this closed form for the 
operator, a numerical optimization was necessary to obtain the coefficients that 
appear in its analytic form. We saw that this method required the solution of 
very complex simultaneous systems of non-linear equations. It is likely that if the 
technique were applied to other problems it would seldom be possible to find closed 
form solutions for the operators. However, this does not mean that a useful operator 
cannot be derived using these techniques. There are two alternative approaches 
both of which were used in the derivation of the step edge operator, and which can 
be applied when the expressions become too complex to be solved. 

(.) The first of these was used in the previous section and involves the use of 
numerical methods for the determination of some finite number of parameter 
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a 



b 



Figure 2.4. (a) The optimal step edge operator, (b) The first derivative of a 
Gaussian 


values once the solution has been reduced to a parametric form. In fact 
even infinite dimensional objects, e.g. the impulse response of a filter, can be 
approximated by a finite dimensional discrete filter if appropriate constraints 
on the bandwidth (of the infinite filter) are met. All that is required is a 
deterministic criterion which can be applied to the parametric form of the 
operator and which measures the “goodness” of the operator with respect to 
that criterion. 

(ii) The second method is necessary when it is not even possible to write down 
a closed form for the criterion of optimality. This problem arises when the 
image model contains some random component (e.g. Gaussian noise) and it 
is then necessary to form criteria that reflect some meaningful statistics on 
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e behaviour of operator on an ensemble of images. Gaussian independent 
random processes are particularly easy to analyse, but even with Gaussian 
1 “- the closed form criteria for step edges led to very complex solutions. 

owever, m the further work section of this thesis we will propose a method for 
trans ormmg problems that involve certain stationary processes into equivalent 
probelms involving only Gaussian independent processes. 

In fact in the work leading up to this report, the second method was used successfully 

6 ° re " t f ° rm SOlUUOn USinS the 6rSt meth0d obtained. This is almost 
certainly the rule rather than the exception. While at best the stochastic method 

ea s o an approx.mate solution, and may not be feasible if the parameter space 

" P0 °; 7 COnditi ° ned> * " Sti11 f6lt that * “ « technique and may guide the 
search for an analytic solution. 

The stochastic method begins, as did the analytic method, with a model 

of the image. Again we consider a step edge with superimposed white Gaussian 

e seek a filter / which maximizes some criterion but in this case we 

cannot characterize the filter by its (infinite) impulse response. Instead we consider 

a -crete filter i.e. we represent the filter by its impulse response sampled at 

positions 0, r, 2r etc. Provided that the bandwidth of the corresponding continuous 

impulse response filter is less than the Nyquist frequency, the continuous filter 

- completely described by its discrete approximation. It turns out that for the step 

e ge operators, which have small bandwidth, only about 12 samples are necessary 

.s was „ot known before the optimization was done and 32 samples were used 
for the discrete filter. 


The optimization algorithm is essentially a hill-climbing search over the space 
o possible filters. It proceeds by continuously iterating through the following steps 

(■) Create a (discrete) noisy edge by adding Gaussian random numbers to the 
sampled values of a step edge. 


(ii) Convolve the filter with this edge, and evaluate the response. 

(iii) Perturb the filter coefficients (sampled values) by a small amount 
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(iv) Convolve this new filter with the edge, and evaluate the new response. 

(v) Change the filter based on the effects of the perturbation in (iii). 

Note that this procedure is not guaranteed to lead to a solution even in the 
case where the analytic solution space is convex. It differs from deterministic 
hill-climbing procedures in that the “hills” (the contours of constant evaluation 
in parameter space) are not fixed but vary from iteration to iteration. There is a 
random component in any particular evaluation caused by the presence of noise in 
the modelled image. We can only say that the limit of the mean of a number of 
such evaluations will be the contours that would be obtained from the deterministic 
criteria. In fact the magnitude of the changes caused by image variations greatly 
overshadowed the magnitude of the changes due to the perturbations in the filter 
coefficients. It was therefore important to apply the perturbed and original filters 
to the same image. 

To see when this method should converge, we assume that there exists a 
deterministic evaluation function F over the parameter space, and such that we 
can locally estimate the evaluation of an n-tuple of parameters p as 


Ei = F(?) + r 

where r is a random variable from some unknown distribution which models 
the effects of the image noise. If we now perturb the filter coefficients by some small 
6p, we obtain the new evaluation 


E 2 ~ F(p -j- <5j3) -f* r 


assuming that the value of r is constant (the image has not changed) over some 
small neighbourhood of p. If we subtract E\ from E 2 and divide by |<5p| we obtain 


E 2 -E 1 = F(p + 6p) - F(p) 

l<5p| |«?| 


(2.43) 
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Now 


Km +m - m 

|(5^| 


VF(p) . a 


where a is a unit vector in the direction of fp. By using n normal perturbations 
6pi with n corresponding unit vectors u, and forming the sum 


A (Ei — E{)Sp i 

si m 2 


n 

J2(VF(p) ■ a,)n, = vf(p) 


(2.44) 


we have formed an estimate of the gradient of the evaluation function at the 

pomt p m parameter space. Another way of forming an estimate of VF is to use 

perturbations which are randomly distributed. By randomly distributed we mean 

that each component of 6 p is an independent random variable with zero mean 

and variance c\. Then the expectation value of the perturbation weighted by the 
change in evaluation is 


^ 1^2 — Ui)op\ 


v J WFp 




So we can also achieve an estimate of the gradient of F by making random 
perturbations in the filter coefficients and weighting those perturbations by the 
change in evaluation. This method provides a more uniform coverage of the 
neighbourhood around a single parameter space point than does a particular choice 
of orthonorma. perturbations. The implementation uses random perturbations and 
a short term averaging filter to obtain an estimate of the gradient of F over several 
iterations. The filter used has a single pole (i.e. its response to an impulse is an 
exponentially decaying sequence), and can be described by the difference equation 


7t j)6P] 

<^| +■ %-i (2.46) 

where is the estimate of the gradient of / at the iteration, and the subscripted 
quantities E 2] , E u and % are the values of these quantities at the j th iteration p 

18 3 Ume C ° nStant bEtween 0 aud 1. ^ determines the “inertia” of the system 
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The algorithm performs a simple hill-climbing by taking the estimate of the 
gradient at each iteration and adding a multiple of it to the current value of p. 
However we have (necessarily) added a time constant in the estimator for VF so 
that we can obtain a continuously updating estimate while simultaneously climbing 
using the existing estimate. We now have a system with both “inertia” from the 
average over the previous iterations and a “viscosity” caused by the fact that this 
average decays with time (assuming f3 is less than 1). Therefore it is possible for 
it to overshoot a minimum, or even to oscillate several times before settling at the 
minimum. It was necessary to set the time constant empirically in order to obtain 
accurate estimates of gradient without excessive overshoot. The behaviour of the 
system is roughly analogous to a ball rolling along a contoured surface under the 
influence of gravity, with perfectly viscous drag. 

The major reason for resorting to stochastic methods for the optimization was 
that the evaluation criterion is a function of a particular response, rather than an 
estimate of the behaviour of the filter on a large set of inputs. But the abstract 
criteria should be the same. The heuristic criterion should evaluate both the error 
rate and the localizing ability of the operator. The criterion actually used is 

Ey — —5011 7lmaz| dmax (2*47) 

where n max is the number of local maxima that occur in a fixed neighbourhood 
of the edge, and d mai is the distance of the strongest maximum from the centre of 
the true edge. Note that the two terms in the expression are “penalty” measures, 
hence the two negations. 

Figure (2.5) shows the algorithm converging to a solution after the filter has 
been initialized to a difference of boxes. In figure (2.6) the initial filter coefficients are 
random and independent. It is worth comparing figure (2.5) with figure (2.3), which 
showed the best analytic form of the operator for various inter-maximum distances. 
It seems that the stochastic method moves through parameter space in such a way 
that it passes through several of these analytic optimal forms before reaching a 
global extremum. The two methods produce similar solutions even though their 
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criteria are slightly different. This is strong evidence that the form of the optimal 
e ectorrobust with respect to the actual choice of criteria, so long as the criteria 
depend on both error rate and localising ability. We will see further evidence of 
is in the work of Shanmugam et al (1979) in a later chapter. 
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8<RRT-HALF-FIX-32. 5740642> 

;; after 200 iterations, perfornance index is 393 



8<RRT-HRLF-FIX-32 5740642> 

;; after 3300 iterations, perfornance index is 312 



8<RRT-HRLF-FIX-32. 5740642> 

;; After 14100 iterations, perfornance index is 204 



»<RRT-HRLF-FIX-32. 5740642> 

;; After 43500 iterations, perfornance index is 136 



Figure 2.5. Convergence of the stochastic optimization procedure after 
initialization to a difference of boxes 
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3. Two or More Dimensions 


In one dimension we can characterise the step edge in space with one position 
coordinate. In two dimensions an edge also has an orientation. In this chapter we 
will use the term “edge direction” to mean the direction of the tangent to the 
contour that the edge defines in two dimensions. Suppose we wish to detect edges 
of a particular orientation. We create a two-dimensional mask for this orientation 
by convolving a linear edge detection function aligned normal to the edge direction 
with a projection function parallel to the edge direction. A substantial saving in 
computational effort is possible if the projection function is a Gaussian with the 
same a as the (first derivative of the) Gaussian used as the detection function. 
It is possible to create such masks by convolving the image with a symmetric 
two-dimensional Gaussian and then differentiating normal to the edge direction. In 
fact we do not have to differentiate normal to every possible edge direction because 
the slope of a smooth surface in any direction can be determined exactly from its 
slope in two directions. The simplest form of the detector uses this method. 

After the image has been convolved with a symmetric Gaussian, the edge 
direction is estimated from the gradient of the smoothed image intensity surface. 
The gradient magnitude is then non-maximum suppressed in that direction. The 
directional non-maximum suppression is equivalent to the application of the 
following non-linear differential predicate 


. do 2 


G*I 


0 


where n = which has the same zero-crossings as 


VS • V(VS • VS) = 0 (3.1) 

where S = G*I and where I is the image and G is a symmetric Gaussian. This is 
readily varified by using the substitution 
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A <? _ n • VS 
dn 

The form of non-linear second derivative operator used in (3.1) turns out to 

1 : 777; pr 7 ed by (lga) , Tone and Poggio 

"" p '»'“<“»> - -—- - 

This operator actually locates either maxima or minima, by locating the 

zero-crossings in the second derivative in the edge direction. In principle this 

opera or could be used to implement an edge detector in an arbitrary number of 

pensions by first convolving the image with a symmetric n-dimensional Gaussian. 

convou ion with an n-dimensional Gaussian is highly efficient because the 
Gaussian is decomposable into n linear filters. 

There are other more pressing reasons for using a smooth projection function 

. ' USS ‘ an - Wh6n W apply a linear °P-ator to a two dimensional image 

we orm a every point in the output a weighted sum of some of the input values. For 

he edge debtor described here, this sum will be a difference between local averages 

the different sides of the edge. This output, before non-maximum suppression 

represents a kind of movinf-r avpracrp ^ • 1 

moving average of the image. Ideally we would like to use 

*« «K- - of limited extent. It i, W„, 

ternary to window the projection function (see Hamming 1983). If the window 
" abruptly tru„„ M , ,. g . » „ i, ^ ^ ^ ^ ^ 

t» rrr ^ ^ ^ 

to the Gibb, phenomenon in Fourier theory. When no.-maximum ..pp,.^ „ 
Whed them variations t,„ d „ ^ ^ 

m severe cases are not even continuous. 

The solution is to use a smooth window function. In signal processing, typical 
windows used are the Hamming and Hanm.g windows. The Gaussian,, . Jab,, 
approximation to both of the.., „d ,, certainly h „ ^ > 

give, -P.W widu, (The Gauss!,. i. unique 

bandwidth and frequency). The effect of th, window function become, v „y 
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marked for large operator sizes and it is probably the biggest single reason why 
operators with large support were not practical until the work of Marr and 
Hildreth on the Laplacian of Gaussian. The perceptive reader will probably see 
the similarity between these smoothness constraints in the projection function to 
preserve continuity of contours in the edge direction, and the smoothness of the 
detection function implied by the addition of the multiple response constraint. 

It is worthwhile here to compare the performance of this kind of directional 
second derivative operator with the Laplacian. First we note that the two- 
dimensional Laplacian can be decomposed into components of second derivative in 
two arbitrary orthogonal directions. If we choose to take one of the derivatives in 
the direction of principal gradient, we find that the operator output will contain 
one contribution that is essentially the same as the operator described above, and 
also a contribution that is aligned along the edge direction. This second component 
contributes nothing to localization or detection, (the surface is roughly constant in 
this direction) but increases the output noise. This will be verified analytically in 
chapter 7. 

A version of the detector which used the Gaussian convolution followed by 
directional non-maximum suppression has been implemented and performed very 
well. Examples of its output will be given in chapter 6. While the complete 
detector includes multiple operator widths, orientations and aspect ratios, they are 
a superset of the operators used in the simple detector. In typical images, most of 
the edges are marked by the operators of the smallest width, and most of these by 
non-elongated operators. However, as we shall see in the following sections, there 
are cases when larger or more directional operators should be used, and that they 
offer considerably better performance when they are applicable. The key to making 
such a complicated detector produce a coherent output is in the design of effective 
decision procedures for choosing between operator outputs at each point in the 
image. 

3.1. The Need for Multiple Widths 

Having determined the optimal shape for the operator,we now face the problem 
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, : " £ ll * ” ldth »<*“» » - * h- detec tion/localitation 

trade.,, m . p „ Ucu| „ , pp |,„ 0 „ n . ta ^ ^ ^ ^ ^ ^ ^ 

ifferent hr ,«h edg, .ithi, „ im„e, » W1 „ b , „ mr ^ 

several —itt. of ope,,,,, |, tb , T ,„ dml| „„ „ „ 0[mto ^ 

use must be made dy„„r«ally by the alg.rdt.m and , hi , „ quir „ , 

«f the energy i„ ,b, „gi„ ,u,„„„ dmg ^ ^ ^ ^ 

••ergy k„™, ,b, signal t, ,M|„ „ ,., ch ^ „ 

.. then .« a ,h, P r, b , bil i ty di,„ ibolion „ ^ ^ . 

calculate the Polity „ . candidate «, [e b , ing , , dg , . gtan ^ 
this probability will be different for different operator widths). 

Since the a-priori penalty associated with a falsely detected edge is independent 
o e edge strength, it is appropriate to threshold the detector outputs on probability 
o error rather than on magnitude of response. Once the probability threshold is 
set, the minimum acceptable signal to noise ratio is determined. However, there 
may be several operators with signal to noise ratios above the threshold, and in this 
case the smallest operator should be chosen, since it gives the best localization We 
can afford to be conservative in the setting of the threshold since edges missed by 
the smallest operators may be picked up by the larger ones. Effectively the trade-off 
between error rate and localization remains, since choosing a high signal to noise 

10 M leadS t0 " ,0Wer error rate - but ^11 tend to give poorer localization 

since fewer edges will be recorded from the smaller operators. 

In summary then, the first heuristic for choosing between operator outputs 
is that small operator midths should be used whenever they have sufficient S 
is is similar to the selection criterion proposed by Marr and Hildreth (1980) 
for choosing between different Laplacian of Gaussian channels. In their case the 
argument was based on the observation that the smaller channels have higher 
resolution, ,.e. there is less possibilty of interference from neighbouring edges That 
argument is also very relevant in the present context, as to date there has been no 
consideration of the possibility of more than one edge in a given operator support 
nterestingly, Rosenfeld and Thurston (1971) proposed exactly the opposite criterion 
m the choice of operator for edge detection in texture. The argument given was 
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that the larger operators give better averaging and therefore (presumably) better 
signal to noise ratio. 

Taking this hueristic as a starting point, we need to form a local decision 
procedure that will enable us to decide whether to mark one or more edges when 
several operators in a neighbourhood are responding. If the operator with the 
smallest width responds to an edge and if it has a signal to noise ratio above the 
threshold, we should immediately mark an edge at that point. We now face the 
problem that there will almost certainly be edges marked by the larger operators, 
but that these edges will probably not be exactly coincident with the first edge. A 
possible answer to this would be to suppress the outputs of all nearby operators. 
This has the undesirable effect of preventing the large channels from responding to 
“fuzzy” edges that are superimposed on the sharp edge. 

Instead we use a “feature synthesis” approach. We begin by marking all 
the edges from the smallest operators. Prom these edges, we synthesize the large 
operator outputs than would have been produced if these were the only edges in the 
image. We then compare the actual operator outputs to the synthetic outputs. We 
mark additional edges only if the large operator has significantly greater response 
that what we would predict from the synthetic output. The simplest way to produce 
the synthetic outputs is to take the edges marked by a small operator in a particular 
direction, and convolve with a Gaussian normal to the edge direction for this 
operator. The o of this Gaussian should be the same as the a of the large channel 

detection filter. 

This procedure can be applied repeatedly to first mark the edges from the 
second smallest scale that were not marked by at the first, and then to find the 
edges from the third scale that were not marked by either of the first two etc. 
Thus we build up a cumulative' edge map by adding those edges at each scale that 
were not marked by smaller scales. It turns out that in many cases the majority 
q{* edges will be picked up by the smallest channel, and the later channels mark 
mostly shadow and shading edges, or edges between textured regions. 
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3.2. The Need for Directional Operators 

So far we have assumed that the projection function is a Gaussian with the 

Z ' “ ^ G ™ n ^ detection function. In fact both the detect 1 

and localization of the operator improve as the length of the projection functn 

mcreases. We now prove this for the operator signal to noise ratio. The proof for 
localization ,s similar. We will consider a step edge in the x direction which passl 
through the origin. This edge can be represented by the equation 


I(x, y) = Au^(y) 

re u i is the unit step function, and A is the amplitude of the edge as before 

,17™ “ 7 G ““'“ v *'“* -»~ 

r h “ w “ h * -*« "»p*« ~p.™ i, a., r) 

the response to the edge (at the origin) is ’ 


r- f-oo 

-oo 7-00 7(i, y) dx dy 
The root mean squared response to the noise only is 


<ci r^rt**)* 


•" of the inttg „ |Sj „ d „„ 

,„.W b, S. WO h,V, wh „ hwpm if „ lht rundta 

1 „ Tf” ** m ~ * h * — - *. Pfi-Con function h, 

replacing f(x, y) by f(x, f). The integrals become 


/- M /-oo / 7- 7 )dxdy = Z JZ f( x , yi ) l 


dx dyi 


n 00 


{Lee Loo f (x ’ l ) dx d y) - n Oo(/_ x fL x f(x, y 1 )ldxdy 1 j 
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And the ratio of the two is now \Z~IT,. The localization A also improves as 
\fl. It is clearly desirable that we use as large a projection function as possible. 
There are obviously practical limitations on this, in particular all edges in an image 
are of limited extent, and few are perfectly linear. However, most edges continue 
for some distance, in fact much further than the 3 or 4 pixel supports of most 
edge operators. Even curved edges can be approximated by linear segments at a 
small enough scale. Considering the advantages, it is obviously preferable to use 
the directional operators whenever they are applicable. The only proviso is that 
the detection scheme must ensure that they are used only when the image fits a 
linear edge model. 

The present algorithm tests for applicability of each directional mask by 
forming a goodness of fit estimate. It does this at the same time as the mask itself 
is computed. An efficient way of forming long directional masks is to sample the 
output of non-elongated masks with the same direction. This output is sampled at 
regular intervals in a line parallel to the edge direction. If the samples are close 
together (less than 2 a apart), the resulting mask is essentially flat over most of its 
range in the edge direction and falls smoothly off to zero at its ends. Two cross 
sections of such a mask are shown in figure (3.1). In this diagram (as in the present 
implementation) there are five samples over the operator support. 

Simultaneously with the computation of the mask, it is possible to establish 
goodness of fit by a simple squared-error measure. Since the quantity being estimated 
to produce the mask is the average of some number of values, the squared error is 
just the variance of these values. We then eliminate those operator outputs whose 
variance is greater than some fraction of the squared output. Where no directional 
operator has sufficient goodness of fit at a point, the algorithm will test the outputs 
of less directional operators. This simple goodness of fit measure is sufficient to 
eliminate the problems that traditionally plague directional operators, such as false 
responses to highly curved edges and extension of edges beyond corners, see Hildreth 
(1980). 

This particular form of projection function, that is a function with constant 
value over some range which decays to zero at each end with two roughly half- 
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directio g n, r (btcross re s C ectbn normalto £jjfdl^tSc ) 6 ^ i ° n P aralle . 1 the edge 
responses of several masks. g directl0n ( c ) Two-dimensional impulse 


Gau.ians, is very similar to a commonly used extension of the Hanning window. 

is a er unction is flat for some distance and decays to zero at each end with 
two half-cos,ne bells (Bingham, Godfrey and Tukey 1967). We can therefore expect 

‘ h r aV : g °° d Pr ° PertieS 38 a average estimator, which as we saw at the 

a ar of the chapter, ,s an important role fulfilled by the projection function. 

All that remains to be done in the design of directional operators is the 
specification of the number of directions, or equivalently the angle between two 
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adjacent directions. To determine the latter, we need to determine the angular 
selectivity of a directional operator as a function of the angle 0 between the edge 
direction and the preferred direction of the operator. Assume that we form the 
operator by taking an odd number 2 N + 1 of samples. Let the number of a sample 
be n where n is in the range —N ... -f iV. Recall that the directional operator 
is formed by convolving with a symmetric Gaussian, differentiating normal to the 
preferred edge direction of the operator, and then sampling along the preferred 
direction. The differentiated surface will be a ridge which makes an angle 0 to the 
preferred edge direction. Its height will vary as cos0, and the distance of the n th 
sample from the centre of the ridge will be nd sin 9 where d is the distance between 
samples. The normalized output will be 



cos0 
2AT + 1 



(nd sin 0) 2 \ 
) 


If there are m operator directions, then the angle between the preferred 
directions of two adjacent operators will be 180/m. The worst case angle between 
an edge and the nearest preferred operator direction is therefore 90/m. In the 
current implementation the value of d/o is about 1.4 and there are 6 operator 
directions. The worst case for 0 is 15 degrees, and for this case the operator output 
will fall to about 85% of its maximum value. 


3.3. Noise Estimation 

To estimate noise from an operator output, we need to be able to separate its 
response to noise from the response due to step edges. Since the performance of 
the system will be critically dependent on the accuracy of this estimate, it should 
also be formulated as an optimization. Wiener filtering is a method for optimally 
estimating one component of a two-component signal, and can be used to advantage 
in this application. It requires knowledge of the autocorrelation functions of the 
two components and of the combined signal. Once the noise component has been 
optimally separated, it is squared and locally averaged. In fact we can further 
improve the separation in the smoothing phase, since when we use the noise estimate 
we will be comparing it to the response of the edge detection operator at a local 
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Since the autocorrelation of the output of a filter 


equal to the autocorrelation of its impulse 


m response to white noise is 


response, we have 
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If 92 is the response of the operator derived in (2.38) to a step edge then 
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The filter K above is convolved with the output of the edge detection operator 
and the result is squared. The next step is the local averaging of the squared noise 
values. The averaging filter is basically a broad Gaussian, but its accuracy can be 
improved in this application by orthogonalizing it to the step edge response. Let 
the averaging filter be expressed as A\(x) —A^x) where 

Ai(i) = aiexp(-^j) 

A 2 (x) = a 2 exp 

and the a in the expression for Ai is the same as for the detection filter. 
(Actually, the optimal shape for A 2 is the square of the second derivative of a 
Gaussian, but the use of this function makes the scheme very sensitive to small 
variations in the position of the detection filter maximum). The constants a\ and 
02 are chosen so that the net response to the squared filtered step response is zero, 
i.e. 



/:>(-*)+*<-*>)(?* - * = ° 

Having formed an estimate of the local noise energy at every point, we can 
now deal with the problem of setting operator thresholds to achieve minimal error 
rate. This is the subject of the next section. 

3.4. Thresholding with Hysteresis 

Virtually all edge detection schemes to date use some form of thresholding. 
If the thresholds are not fixed a priori but are determined in some manner by 
the algorithm, the detector is said to employ adaptive thresholding. The solitary 
exception is the Marr Hildreth scheme, where edges are marked at any zero-crossing 
in the output of a Laplacian of Gaussian filter. This is not a practical proposition 
because there is a very high density of zero-crossings in the response to pure noise 
even if the noise has vanishing energy. Most practical implementations of this 
scheme use thresholding based on the slope of the zero-crossing. 
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3.5. Sensitivity to Smooth Gradients 


It has been pomted out in Binford-Hom (1973) that images frequently contain 
slow gradients, and that edge detectors which are sensitive to these gradients are 
prone to mark mult,pie edges in regions where the gradient is high. The edge operator 
derived in the last chapter will be sensitive to image gradients and we should now 
as ,f it ,s possible to eliminate this sensitivity without prejudicing performance. 
One possibility would be to use an operator which is a linear combination of 

two different widths of the optimal operator, such that the resulting operator is 
insensitive to gradients. Suppose the function / is given by 



Then we find that 



0 


So this function will certainly be insensitive 


.. . w gradients, and its performance 

will be g,ven by equation (2.12). The signal to noise ratio and localization are now 



A = 


'Ai + 4(^2 + %°\°\ + 




v/-i 


Asymptotically as o 2 tends to infinity, the above expressions tend to the 
limiting values 
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Which gives the same overall performance as a simple first derivative of 
Gaussian as given by equation (2.40). In practice a value of o<i of around 3ai is used 
to reduce the computational expense and to prevent the operator becoming too 
sensitive to nearby edges because of its large support. The overall performance EA 
is reduced by about 30% in this case. Note also that as 02 approaches o\ we obtain 
a third derivative of a Gaussian, which is similar to the operator sometimes used 
to estimate the strengths of Marr-Hildreth zero-crossings. But the performance is 
reduced in this case, as will be shown in chapter 7. 

The fact that / is insensitive to gradients implies that / may be expressed as 
the derivative of a function g which has zero mean value, and which is symmetric 
because / is antisymmetric. A symmetric function g with zero mean value may be 
thought of as a lateral inhibition operator as described by Binford (1981). Lateral 
inhibition was proposed as a mechanism for reducing sensitivity to gradients. But 
the form of the lateral inhibition operator was not determined analytically when 
in fact it has a direct effect on the performance. 
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4. Finding Lines and Other Features 

Chapters two and three described in some detail the derivation of an optimal 
operator for step ed g es in Gaussian noise. The derivation of the analytic form of 
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4.1. General Form for the Criteria 

When we derived the analytic criteria for step edges in chapter 2, there were 
only two places where the form of the input waveform actually affected the criteria. 
By inserting a general feature in place of the step edge we can readily obtain a 
general criterion. Recall that the definition of the signal to noise ratio £ was the 
quotient of the responses of the operator to the input waveform and to noise only. 
The response to noise for an operator with impulse response f(x) will be given by 
equation (2.4), and is 


n 0 




i 


The response of this operator at the “centre” of an arbitrary waveform F(x) is 
similar to equation (2.3) and is just 


/ + F{—x)f(x)dx 

J —00 

So the signal to noise ratio for / and any feature F is (assuming no = 1) 

E=(41} 
' Si* 

The method for determining the localization of the operator is also similar to 
that used in chapter 2, and we will not describe it fully here. Recall that localization 
was defined as the reciprocal of the standard deviation in the position of the marked 
edge relative to the true edge. To find maxima in the operator response we actually 
locate zero-crossings in the derivative of its response. Localization was defined as 
the quotient of the slope of the zero-crossing and the root mean squared noise in 
the differentiated response. The latter is given by equation (2.7) and has the value 
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The slope of the differentiated operator reponse is 


dP_ 

dxl 




L — dx = S-oo F (~ x )f"( x ) 


dx 


And so the localization A becomes (assuming 7 ig — 1) 


A = 


= J+” F(~x)f"(x) dx 

'Jl+Z f' 2 (*) dx 


(4.2) 


The final form of the composite criterion can now be written as the product of (4.1) 
and (4.2) thus 


EA = & F (~ x )f( x ) ^ J+” F(-x)f"(x) dx 

\fc PP) dx \//-» f' 2 {x) dx (4 ' 3) 

Thus finding an arbitrary feature detector requires the maximization of this 
functional, subject possibly to some subsidiary constraints such as the multiple 
response constraint (2.25). This is difficult in general, even if the feature F is 
particularly simple, like a step edge. However the form of the functional (4.3) is 
simple enough that given a candidate feature detector we can readily evaluate its 
performance analytically. If the operator impulse response / and the feature F 
are both represented as sampled sequences, evaluation of (4.3) requires only the 
calculation of four inner products between sequences. 

This suggests that numerical optimization can be done directly on the sampled 
operator impulse response. This method can be expected to be much faster than 
the stochastic optimization since the evaluation of performance is exact, and the 
gradient at each point in function space can be accurately estimated. At the same 

time it is very general in that optimization for any waveform only requires a sampled 
version of the waveform. 

The output will not be an analytic form for the operator, but an implementation 
of a detector for the feature of interest will require discrete point-spread functions 
anyway. It is also possible to add additional subsidiary constraints by using a penalty 
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method (see Luenberger 1973). In this method the constrained optimization is 
reduced to one (or possibly several) unconstrained optimization. For each constraint 
we define a penalty function which has a non-zero value when one of the constraints 
is violated. We then find the maximum of 

S(/)A(/) - H>P(f) 

Where P is a function which has a positive value only when a constraint is 
violated. The larger the value of /ij the greater the likelihood that the constraints 
will be satisfied, but at the same time there is a better chance that the method will 
become ill-conditioned. A sequence of values of may need to be used, with the 
final value from each optimization used as the starting value for the next. The /i» 
are increased at each iteration so that the value of P(f) will be reduced, until the 
constraints are “almost” satisfied. 

An example of the method applied to the problem of detecting “ridge” profiles 
is shown in figure (4.1). The function F for a ridge is defined to be a flat plateau 
of width w, with step transitions to zero at the ends. The auxiliary constraints are 

(i) The multiple response constraint. This constraint is taken directly from equation 

(2.25), since it does not depend on the form of the feature. 

(ii) The operator should have zero DC component. That is it should have zero 

output to constant input. 

Since the optimal operator for ridges is also symmetric, it will have zero 
response to a constant gradient. This means that it can be represented as the second 
derivative of a function of finite extent, which in turn suggests that there may be 
economical ways of computing operators for several orientations in two dimensions. 

The figure shows two different operators derived for the same feature. The two 
operators differ in the size of their possible support. The second is constrained to 
lie within a region twice the width of the ridge, while the second has a support 
three times the ridge width. The performance of the second operator is very slightly 
worse than the first. However the fact that it requires a smaller support means that 
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Figure 4.1. A ridge profile and the optimal operator for it 


“ J •" b ' adjto.l Thl , 
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between ridge, to ridge width, i.e, i, to ridge, to vnlloy, have the 

width. 


mce the w.dth of the operator is determined directly by the width of the 
ndge, there is a suggestion that several widths of operators should be used. This has 
not been done m the present implementation however. With this ridge model a wide 
n ge can be centered to be two closely spaced edges, and the implementation 
a ready includes detectors for these. The only reason for using a ridge detector is 
that there are ridges in images that are too smail to be dealt with effectively by the 
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Figure 4.2. A roof profile and an optimal operator for roofs 

narrowest edge operator. These occur frequently because there are many features 
(e.g. scratches and cracks or printed matter) which result in discrete contours only 
a few pixels wide. 

A similar procedure was used to find an optimal operator for roof edges. These 
features typically occur at the concave junctions of two planar faces of an object. 
The results are shown in figure (4.2). Again there are two subsidiary constraints, 
one for multiple responses and one for zero response to constant input. Note that 
the difference between the two operators is essentially their “resemblance” to their 
respective inputs. We would expect this from the theory of Wiener filtering. The 
optimal Wiener filter for a signal in white Gaussian noise is just the time-reversed 
signal. Wiener filtering considers only signal to noise ratio however, and the 
localization and multiple response criteria impose effective smoothness constraints 
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on the operator. 

A roof edge detector has not been incorporated into the current edge detector 

ecause ,t was found that ideal roof edges were reiatively rare. In any case the ridge 

e ector ,s an approximation to the ideal roof detector, and is adequate to cope 

. t m The situation may be different in the case of an edge detector designed 

explicitly to deal with images of po.yhedra, like the Binford-Horn line-finder (1971) 

Here several width of roof operator may be desirable to deal with different signal 
to noise ratios in the image. 

The method described above has been used to find optimal operators for both 
n ge and roof profiles and in addition it successfully finds the optimal step edge 
operator derived m chapter 2. It should be possible to use it to find operators for 
ar . vary eatures, and for optimal step operators to deal with non-white noise. For 
exampie the problem of detecting step edges in "blue” noise (uncorrelated noise 
a has been passed through a perfect differentiator) reduces to the problem of 
detecting roof edges in white noise. So the optima, detector for step edges in this 
case is the derivative of an optimal roof operator. Note that it is not the same roof 
operator which we derived here because the latter includes the zero DC response 
constraint, which does not translate to something useful for the step operator. 

4.2. In Two Dimensions 

We now face the problem of extending the one-dimensional ridge operator 

to two dimensions. As in the case with step edges, the extension is an operator 

composed of a detection function normal to the ridge direction and a projection 

unc ion parallel to it. As before we non-maximum suppress the output of the 

convolution of the image with this mask normal to the edge direction. The maximal 

points can then be threshold (with hysteresis) and the marked ridge contours can 
be combined with the edge map. 

In the case of ridges however it is much harder to obtain an accurate estimate 
° e r, ge ,section. There is no simple measure like the gradient direction which 
aligns with the normal. While it is true that the larger principal curvature will be 
normal to the ridge direction, this is a much less reliable quantity to measure from 
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even a smoothed image. Remember that the ridge detector will be operating below 
the resolution of the smallest step edge operator, so the degree of smoothing will 
be slight. This suggests that it will be necessary to use several oriented masks at 
each point and to choose the one which best fits the ridge locally. This has been 
found to be an inadequate solution because it performs so poorly when the ridge is 
highly curved, as is generally the case for printed text. While the highly directional 
masks have advantages for long straight ridges, they are not adequate as general 
ridge detectors. 

In practice a measure similar to the curvature must be used. The direction of 
principal curvature cannot be used directly, and not merely because it is a noisy 
measure. The peak of a ridge should be approximately flat in the ridge direction 
but highly curved normal to this direction. But there are points on the sides of 
the ridge that are approximately planar. Here the direction of greatest curvature 
will be arbitrary. It is quite possible that it will happen to be parallel to the ridge 
direction and that there may be a slight maximum in this direction, and hence a 
ridge point will be marked. 

To prevent these erroneous points from being marked it is necessary to modify 
the ridge direction estimate so that it takes into account the slope of the ridge 
normal to the direction of greatest curvature. This slope will be approximately zero 
if the point is at the top of the ridge, but for points on the side of the ridge where 
the greatest curvature may lie parallel to the ridge direction, the slope normal to 
this direction (which is the slope of the ridge face at that point) is large. So instead 
of non-maximum suppressing in the direction of maximum curvature, we use the 
direction n which maximizes 


<Pl 

dn 2 


+ aabs 


dl 

.dn-L. 


where n^- is the normal to n, and a is some positive constant. 


So in regions of low curvature, the above measure chooses a direction in which 
the slope is large, which is the correct behaviour for points on the sides of a ridge. 
This method has been used and seems to behave quite well even on difficult ridge 
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merging feature descriptions, where there is no a priori reason to prefer one feature 
to another, it should be symmetric. 

For example, suppose a ridge detector and a step edge detector both respond 
to a feature that is roughly a step edge. From the edge points marked by the 
edge detector, we synthesize the ridge detector output that would have occurred 
had the image actually contained a step edge. Then the ridge detector output is 
compared with the synthetic output and if it is not significantly greater, no ridge 
point will be marked. Similarly from the ridge detector output, we reconstruct the 
edge detector output that would have occurred if the ridge detector accurately 
described the image. The edge detector output will (in this example) be much 
stronger than the synthesized output and so an edge point will be marked. This 
method has the advantage that it is possible to mark the occurrence of more than 
one type of feature at a point in an image. It has been found that such points do 
occur in images (Herskovits and Binford 1970) in particular roof edges are often 
superimposed on step changes and “edge effects” which are similar to ridges also 
often accompany step changes. 

It is necessary to consider ridges and valleys as different features as the detector 
may output both kinds of interpretation near a single feature in the image. Some 
form of integration technique such as feature synthesis or goodness of fit testing 
must be used. This has not been done in the present implementation, which only 
integrates one of these features with the step edge map. This is one area where a 
lot of work still remains to be done, and several feature integration techniques need 
to be tried. 
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5. Implementation Details 


The ultimate test of any edge detector is its performance in some application 
on ma images. The translation of a derived operator into a program is non-trivial. 

running version of the program is actually very small, it is still the result 
much refinement. The refinements are as much a part of the design of the edge 

6 o' ", “ " the the0retical ana| y sis Panted in the first four chapters. It is 
not the intent of this chapter to describe any such program. It will describe in an 

a stract W ay several algorithms for implementing some of the processing required 

by the edge detector. The author feels that this is necessary for several reasons 

0 ) Since the edge detector involves a considerable amount of computation, especially 

convolution, it ,s important that efficient algorithms be used if it is to run in 
a reasonable amount of time. 

(ii) Because images may contain very fine detail it is important that local operations 
involve the minimum number of pixels, but provide the best accuracy from 

hem. This applies in particular to operations such as the calculation of 
directional derivatives and non-maximum suppression. 

H Mf, m nol Th , M bat |n mM 

“ “■* lh *«"* •“*' - * , iom 

give consideration to the choice of representation of its output. 

This chapter will not describe in detail all the aspects of the present edge 
detection scheme, in fact some of the algorithms presented do not even form part 
of the scheme, but may be used in future implementions. Instead it will focus 
on two or three of the more critical aspects of the implementation, in particular 
on efficient methods for convolution with Gaussians, and on the details of the 
non maximum suppression scheme. It will close with details of a control abstraction 
for the programming of local parallel operations which are used extensively by the 


70 



5.1. Effects of Discretization 


All of the analysis to date has assumed that the image was a continuous 
differentiable surface and that the edge operators were likewise continuous functions. 
Since most implementations of the detector on digital computers will employ 
discrete filters applied to sampled image data, it is necessary to find accurate 
discrete approximations to the continuous filters. It is also important to consider 
the effect of the smoothing filter (if any) which was applied to the image before it 
was sampled. Smoothing filters are necessary to prevent aliasing of high frequency 
components in the sampled signal. Suppose an image is smoothed, sampled and 
convolved with a discrete filter. By the associativity of convolution, this is equivalent 
to convolving the image with a filter that is the result of the convolution of the 
smoothing filter and the discrete filter. If the image can be smoothed with a 
Gaussian smoothing function, no convolution is necessary at that scale. 

The simplest way to approximate a continuous filter with a discrete filter is to 
sample the former. If this method is to succeed, we must ensure that the sampling 
does not introduce aliasing. Consider a continuous first derivative of Gaussian filter 

of the form 



Suppose that the image is sampled at intervals of r, (the filter must also be 
sampled at this rate for discrete convolution) then the Nyquist frequency is The 
effective bandwidth of the filter should be less than half this frequency to prevent 
aliasing. The Fourier transform of this filter is 



The cutoff frequency is f , and substituting we find that the amplitude at cutoff is 
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Th,s function never reaches zero amplitude for large w , but it does approach 
zero very rapidly. We can set the effective cutoff frequency at the point where this 
unction falls to 0.01 of its maximum value. This limits the smallest value of er that 
we can use for a given sampling rate. The minimum value of <r is approximately r. 
Th |S ' S m fact the srna llest operator size used in the present implementation. 

A second problem arises when we try to approximate infinite Gaussians with 

finite impulse response filters. Once again we can exploit the fact that the Guassian 

ec^s to zero very rapidly in the spatial domain. In the current implementation, 

e Gaussian is truncated at about 0.001 of its peak value. This constrains the 

ratio of the number of samples in the (discrete) impulse response to the width <r of 

the filter. This ratio is typically 8, e.g. to approximate a filter with a o of 2.0 it is 
necessary to use at least 16 samples. 

Finally, it should be mentioned that it is sometimes possible to dispense with 

the convolution step entirely. If the desired value of <r is much less than r, discrete 

convolution is not practical. However, an equivalent convolution may be performed 

w,t a continuous filter (the smoothing filter) before the image is sampled. Since 

m theory the smoothing step is necessary anyway, no extra computational effort is 

required. We find m fact that the smoothing function is determining the performance 

of the subsequent edge detector, and the use of Gaussian smoothing should give 
near optimal step edge detection. 

5.2. Gaussian Convolutions 

It is perhaps surprising to see an entire section devoted to what seems 
a very straightforward and specific computation. However, there are several 
interesting properties of the two-dimensional Gaussian that suggest fast algorithms 
or convolution. In particular, the central limit theorem implies that repeated 

convolution with any finite filter tends in the limit to a Gaussian convolution. We 
begin with a review of discrete convolution. 

5.2.1. Discrete Two-Dimensional Convolution 

The output of the convolution of a discrete image 7(n, m) with a two-dimensional 
niter is given by the double summation 
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M. M. 

0(n,m) = Y, Y, /(n — t,m — y)/(i,y) 

assuming that the filter f has the same size Af in both dimensions. This method 
requires M 2 multiplications and slightly fewer additions for each output point 
computed. It is a general method and will work with any two-dimensional finite 
impulse response filter. 

5.2.2. Convolution using One-Dimensional Decomposition 

We now consider a more specialized form of convolution which is applicable to 
a limited subclass of two-dimensional filters. The subclass is the class of separable 
two-dimensional filters. This class is characterized by the decomposition of their 
impulse responses into independent linear filters 


/(m) = /xM)*/ y (o,y) 

where the * denotes convolution, and the filters f x and fy have only M non-zero 
components. By using the associativity of convolution, we break the convolution of 
an image with the two-dimensional filter into two convolutions with linear filters 


I * f — I *[fx * fy\ — * fx] * fy 

This method requires only 2 M multiplications and about the same number of 
additions per point. For large operator sizes (values of M of 64 are common) this 
method is substantially faster than the naive method, but is limited to separable 
filters. The number of useful members of this class is actually quite small, and 
the Gaussian is in fact the only two-dimensional symmetric function which can 
be decomposed in this way. Other useful separable functions include the first 
directional derivatives of the Gaussian in the x and y directions. 

5.2.3. Recursive Filtcrring 

So far we have been approximating the infinite Gaussian function with finite 
impulse response filters. It seems that the approximation could be more accurate 
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if we were to use infinite impulse response (recursive) filters instead. We can again 

make use of the separability of the two-dimensional Gaussian, and can therefore 

reduce the filter design problem to that of designing a one-dimensional filter 

which approximates a Gaussian. The infinite impulse response (IIR) filter can be 
characterized by the equation 


z p 

y(n) = a t z(n — i) + bjy(n — j) (5 2 ) 

where x(n) and y(n) are the input and output respectively at the n th point. This 

filter is roughly equivalent to a continuous filter having P poles and Z zeros. The 

positions of the poles are determined by the coefficients b, while the zeros are 
determined by the <z,. 


The immediate drawback of using such a filter to approximate a Gaussian is 
that the filter is infinite “in one direction only”, and that the Gaussian has an 
impulse response that extends to infinity in both directions. The solution to this 
problem is illustrated in figure (5.1). We employ two recursive filters moving in 
opposite directions, each of which has an impulse response which is approximately 
a half-Gaussian. We then sum the two responses (and subtract a component at 
the centre point which is doubled) and are left with a first approximation to the 


symmetric Gaussian. We can if we wish repeat this process on the filtered image 

and (by the central limit theorem) we will obtain a very close approximation to the 
Gaussian, as shown in the last frame of figure (5.1). 


The half-Gaussian is approximated by a damped exponential cosine, which 

requires two poles and two zeros in the recursive filter. The 6 coefficients are derived 

by considering four discrete output values near a zero-crossing of the response. We 
choose the values to be 


exp(ar) sin( ojt) 0 exp(-ar) sin( W r) exp(-2ar) sin(2wr) 

Application of equation (5.2) to the first three and last three values gives two 
equations each of which involves only one of the 6,,and the solution is 
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Figure 5.1. (a) and (b) Recursive Half-Gaussian filters moving from left to right 
and from right to left respectively, (c) sum of these, (d) Result of two applications 
of recursive filter and (e) True Gaussian. 
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— 2 exp(ar) cos(a;r) 


&2 = — exp(2ar) 


where a and « are the decay constant and angular frequency respectively, of the 
damped exponent.al cosine response. Typical values for a and u are 


a = — 
a 


w = 0.8a 


(5.3) 


where o is the standard deviation of the equivalent Gaussian. The a, determine 

e gam of the filter and its first derivative at the origin. For unit gain and best 
approximation to the slope of a Gaussian we use 


a 0 = 1.0 


(ol 2 t 2 \ 

ai = exp^—J-*! 


(5.4) 


The interesting feature of this method is that its complexity is independent 

a a!!- aCt ^ 3 SmgIe PaSS approximation > squires only 12 multiplications 
and additions per point (3 each for filtering in four directions). It also requires an 

ex remely small number of array references, and the number may be reduced even 

further by saving previous x and y values in registers (only three are needed). It is 

poss, e to implement the algorithm using only 4 references per point. In practice 

. is usually better to make two passes over the image, so the above figures should 
be doubled. 

This particular method is of course even more specialised than the previous 

methods, and is only useful for Gaussians and certain other infinite functions 

However it has the lowest complexity of any algorithm discussed, and is very 

economical with regard to memory references. It has the additional advantage that 

the fi,ter sise can be varied by simply changing the value of some parameters, 

without affecting execution of the algorithm. It would seem to be the first choice 
lor any future implementations. 
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5.2.4. Binomial Approximation 

In the last two sub-sections we saw that by using the special properties of 
Gaussians we were able to reduce the number of multiplications required to perforin 
convolution. The resulting algorithms were no longer general convolutions but were 
restricted to subclasses of two-dimensional filters. It is possible by exploiting the 
full power of the central limit theorem to produce an algorithm that requires no 
multiplication at all. Recall that repeated convolution with any spatially limited 
filter tends to an equivalent Gaussian convolution. If we choose the filter to be 
addition of two adjacent points, and if we repeat the addition many times, we can 
obtain a Gaussian approximation without multiplication. 

A useful analogy to this method exploits the equivalence of discrete convolution 
and polynomial multiplication. The filter produced by the addition of two consecutive 
samples is isomorphic to multiplication by the polynomial (z + !)• If the filter is 
applied n times it is equivalent to multiplication by the polynomial (x -j- l) n . The 
coefficients of this polynomial are given by the binomial theorem. We then use 
the fact that for large n, the binomial distribution may be approximated by a 
Gaussian. This method is economical in terms of multiplication, but its complexity 
is relatively high. Since the variance of two distributions add when the distributions 
are convolved, the standard deviation a only increases as the square root of the 
number of convolutions. The value of a after n applications of the addition filter is 



To phrase this result in the terms used in the rest of this section, we find the 
number of samples M required for an equivalent discrete convolution. Since M is 
typically Sa, the relationship between the number of additions per point n and the 
equivalent mask size is 


9 9 

n = —M 2 
64 


The overall complexity of this algorithm for two-dimensional convolution 
(assuming it is used with decomposition) is additions per point, with the 
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■ However both the number of additions and the number of memory referenees 

grows with M even though we are exploiting separability. Since the time required 

or floating pomt addition is often not much lower than the time for a multiply 

is method rapidly becomes unattractive as M increases. 

5.2.5. Fast Convolution 

In recent years, very fast algorithms have been developed Tor integer or 
Hy.om„l multiplication (Sehonhag, and S„„„„ 1971) , A ,y„ plo , kd|j , ^ 

gorithm, are 0 „ i„ , h , „„gt h 0 , ,„„ e „ being „„,tip,i,d, 

convolution, we typically „ „ ueh ^ ^ 

• ..pnt, and we should therefor, expect the asymptotic 

“ ^ °' ‘ h * ^ - «hi,v, anywhere a. 

asymptotic complexity however, would p t .h lbiUv „ y h , g „ OT , tol ,. 

But all is not lost. By using the simplest form of fast mnltiply, 

a very useful speedup with relatively low overhead Consider t 

... overneaa. Consider two sequences of 

numbers which are to be convolved. We assume w.l.o.g. that the length of the 
sequences is 2n, and we break each sequence into two subsequences of length n. 

and JtZ” 8 ^ ' aDd “ d deDOte ^ SUbSeqUenCeS as -d * 


*-*ii„+i 2 and y = j , 1 <| n+! , 2 

where V denotes left-shifting of the sequence by n. We then use the distributivity 
of convolution over addition, and we have * 


**y (*1 in +x 2 ) * (yi X +j, 2 ) = x t * yi X n +( Xl *y i + Xi > yi) X +X} , y2 
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From which it would seem that computation of a 2 n length convolution requires 
4 convolutions of length n, implying an order of growth of n 2 . But the above 
expression can also be written as 


X * y = X! * 2/1 <i 2n +[(*i — x 2 ) * [y 2 — yi) + Xi * yi -f x 2 * y 2 )\ % 4**2 * 2/2 

Which requires only 3 convolutions of length n, plus some extra addition 
and subtraction. We can recursively apply this technique to compute these shorter 
convolutions, and we arrive at an algorithm with lower order of growth than n 2 . 
This leads to recurrence relationships for the number of multiplies and additions 
to perform a convolution of length n 




n 

2 




+ 4n — 4 


where C m is the number of multiplications and C a the number of additions for 
an n-point convolution. When these recurrences are expanded and simplified we 
obtain for the multiplication and addition complexity 


C m [n] = 3 l (=» 3 log2 < n) = n^H s» rt 1 ' 6 


C a [n] = 6.3 L — 8.2 L -f 2 6n 16 for large n (5.5) 

where L is the smallest integer greater than or equal to log 2 (n). To translate 
these results into the context of convolution of a long sequence (say length ni) 
with a much shorter one (length n 2 ), we assume n = n 2 . We will require — 
such convolutions, and each one will require about nj' 6 multiplies. The resultant 
complexity is so we complexity is linear in the 


79 



eng h of the longer sequence but only varies as the square root (roughly) of the 
orter sequence. For two-dimensional convolution, similar results hold and the 

~ COmPleXity " mU,tiPHeS PW While the addiU ° n “-P'exity 
IS about six times this. 

Thus we have the remarkable result that this method has almost the 
same complexly as convolution with one-dimensional decomposition, but uses 
P e e y general two-dimensional mask. The algorithm has been implemented 

"" 7 y “““ ““ “ , IduI « d : vt “ d 

VO u ion at about n _ 16. At n = 1024, the speedup is five to six fold It can 

* i» 6, ,p«,. ,„ d a . p[ m>mory ^ 2 

same as the number of additions. 

eon ?? ° f t> " make this me thcui tuU, than 

7 lh “ *>»»* 
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z: ;;,r,T ■'“* ■* h - * “ —»>... c 

Z K ; T 10 £ ""* 1 “» ,6 to ,6000 

fom " Z " “ ' T ■“* “ —» Hat have eaae.l, 

™, L! „T, TT de ‘““°” ° p " ,l ° r ' ““ «»•«.... 

wnue the fast convolution algorithm has nnf w k 

, , , .. . S nm Has not yet been incorporated into the edge 

detector, it is certainly worthy of further experiment. 

5.2.6. Sub-Summary 

intereT™" A S6Ct '° n ^ highHghted the faCt that ■« frequently manifold 
restingimplementations of seemingly mundane operations, e.g. convolution. But 

^has another purpose. When trying to solve a vision problem the first consideration 

feasibility ^ Sh ° Uld thiS algoHthm The second is 

lee ^ A Ca “ C ° mPUted fr ° m “ ifflage - ^ese two have 

s ould there be any constraints imposed by tractability, i.e. what 

, 6 C °“ PUted m reaSOnab ' e time ' The choice of algorithm should not be 
P ejudiced by considerations of efficiency until much consideration has been given 

imp ementation, and only if it seems likely that no efficient algorithms exist 

or he computation. This theme is characteristic of the work of Marr (1976) who 
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argued strongly for a breakdown of image processing problems using the above 
considerations. 

5.3. Non-Maximum Suppression 

The optimal edge operator was derived under the assumption that edges 
would be marked at maxima in its output. For two-dimensional images, finding 
these directional maxima is straightforward but there has been quite a bit of 
experimentation with various non-maximum suppression schemes. The operation 
should be as local as possible i.e. it should rely on pixels that are close to the 
potential edge point, but it should also be robust and accurate. 

The non-maximum suppression scheme described here may be used in either 
of two ways. In the first instance the edge direction is estimated from the gradient 
of a Gaussian-smoothed image surface by simply differentiating in the x and 
y directions. The gradient magnitude is then non-maximum suppressed in the 
gradient direction. This is just a possible implementation of equation (3.1). In the 
second case, the algorithm is used for non-maximum suppression of the outputs 
of directional masks. Here the gradient direction is fixed and is a property of the 
operator. We again non-maximum suppress the gradient magnitude, which in this 
case is the magnitude of the response of that operator. 

In either case the algorithm is the same. It uses a nine-pixel neighbourhood as 
shown in figure (5.2). The normal to the edge direction (either the gradient or the 
prefered operator direction) is shown as an arrow, and it has components [u X) u y ). 
We wish to non-maximum suppress the gradient magnitude in this direction, but 
we have only discrete values of the gradient at points Pi,r We require three points 
for non-maximum suppression, one of which will be Px,y and the other two should 
be estimates of the gradient magnitude at points displaced from P XtV by the vector 
u. 


Now for any u we consider the two points in the 8-pixel neighbourhood of P x>y 
which lie closest to the line through P x>y in direction u. The gradient magnitude 
at these two points together with the gradient at the point P x>y define a plane 
which cuts the gradient magnitude surface at these points. We use this plane to 
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Figure 5.2. Support of the non-maximum suppression operator 

locally approximate the surface, and to estimate the value^T^^Ttol 
For example, m figure (5.2) we estimate the value of a point in between P I>J(+1 and 
P *+i,y+i that lies on the line. The value of the interpolated gradient is 


u. 


*-=o(. + i., + i J + V^i + D 


U, 


Similarly the interpolated gradient at a point on the opposite side of P x is 

x,y 
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Uy — U X 


G(x,y— 1) 


G 2 = — G(x — l,y —1) + 

Uy Uy 

We mark the point Pi,y as a maximum if G(x,y) > G\ and G(x,y) > G 2 . 
The interpolation is similar for other gradient directions, and it will always involve 
one diagonal and one non-diagonal point. In practice, we can avoid the divisions 
by multiplying through by u y . 

This scheme involves five multiplications per point, but this is not excessive, 
and it performs much better than a simpler scheme which compares the point P XtV 
with two of its neighbours. It also performs better than a scheme which used an 
averaged value for the gradient along the edge, rather than just the value at P Xty . 

5.4. Mapping Functions 

We close this chapter with a brief discussion of a general approach to program 
structure for image processing algorithms. The development of a low-level vision 
program requires many repetitive operations on each point of the image. There 
will be some set of dependencies between the results of these computations, which 
implies that there is a natural (partial) ordering of computations, and that some 
intermediate results must be computed and saved somewhere. Aside from any 
hardware (or object code) considerations, there are two basic ways of implementing 
a software interface for this kind of processing. 

The first and most obvious way is to provide a set of primitive array operations, 
such as addition or convolution, which take arrays as arguments and store results 
into arrays. Programs written using these primitives look like normal sequential 
assembly code, unless there is some complex function (built from the primitives) 
which must move over the array in a non-standard way. In this case the arguments 
describing the way the function is to be moved must be repeated in each call to a 
primitive. This leads to cumbersome and non-orthogonal code. 

The second approach is to provide a mapping function which takes a local 
image processing function as an argument and moves it over some number of arrays, 
with the motion specified by other arguments. The two operations of mapping 
and local computation are now handled by separate functions. This method is 
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rem,n,scent of the MAP- functions in Lisp, (Moon, Stallman and Weinreb 1983 ) 

and is similar in philosophy to the concept of iterators in CLU (Liskov et al. 1979 ). 

It has a number of advantages at the user level. The first is that the local function 

can be written in the source language e.g. Lisp (even if the mapping function 

turns ,t into something quite different) and tested on a sample set of arguments 

W'tl'out having generate images. The source code is more compact and 

ess error-prone, and subjectively more readable. If parallel processing hardware is 

available, the source code would compile into something like the code using the 

first method. There would be no significant differences in the execution efficiencies 
of the two methods. 

For a serial machine though, there are concrete advantages in directly 
implementing something like the second method. Since the local function is 
evaluated completely at one point before moving to the next, there is only one 
storage location required for each intermediate result. This contrasts with the 
former method which needs a block of storage for each intermediate result The 
intermediate values may be held in high-speed storage (registers or cache) which 
greatly reduces the number of memory accesses required to apply the function to 
a u 1 image. There is also a very considerable speedup possible when the local 
unction uses conditional branching, such that the time to process an image point 
epends strongly on the data values at that point. For (most) parallel machines 
the time to apply the function to n points will be the worst case time for a 
one-point application. Conditional branching must be accomplished by setting a 
non-execution flag for the length of the code to be skipped. No advantage can be 
taken of sparse data, or computional shortcuts such as recursive filtering. 

It is the author's experience that the latter style makes code development much 
easier (this was a serious consideration in the implementation of the algorithm, 
much of which is written in Lisp machine microcode). This is true to the extent 
that some of the more complicated functions, such as the sparse directional masks, 
could probably not have been implemented using the first approach because of the’ 
sheer amount of code. The edge detector has been implemented partially using the 
rst approach, and fully using a mapping function (which is itself microcoded). The 
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mapping function maps over any number of arrays, with arbitrary increments for 
each array, and can store results into several output arrays if the function being 
mapped returns multiple values. The difference in execution times between the two 
methods is small, but the second method uses much less array storage, and has a 
much shorter source. 
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6. Experiments 


It has been stressed that edge detection is only the first stage in a vision system 

and that the performance of the detector can oniy be gauged in its context. It has 

a so been argued that the requirements of many of the later modules are similar 

to the extent that it should be possible to design a detector that will work well in 

several contexts. Starting from this assumption we proceeded to design a detector 

based on a precise set of performance criteria which seemed to be common to these 

later modules. We saw in chapters 2 and 4 that it was difficult to capture exactly the 

mtu.t.ve’ criteria that we originally defined. It is virtually impossible to capture 

all of the desirable properties of edge detection in a finite set of criteria and in 

the final analysis the only valid criterion is the performance of the detector on real 

data. This chapter is concerned with evaluating performance at the experimental 

a 76 ’ and Wl1 ' mclude com Parisons with some other edge detection algorithms. The 
evaluation is in three stages: 

(i) Validation of the analytic performance criteria. The operator has been designed 

to optimally detect step edges in Gaussian noise. It should perform well on 
synthetic images of steps. 

(H) Subjective evaluation of performance on real images. The intention here is 

to verify the operation of various parts of the algorithm, in pafticular the 

integration of d.fferent operator widths and orientations. It is not possible 

to validate these by inspection of the detector output, but defects in their 

operation can often be isolated in this way. That is, we cannot tell by looking 

at the output whether the detector is working perfectly, but we can often tell 
where it is failing. 

(iii) Evaluation of detector performance in some context. Since an edge detector 

■s the first stage in many vision programs, it is appropriate to compare edge 

detectors by comparing the performance of the program which uses the two 

detectors. If the original assumption that many vision modules have similar 

requirements is valid, a detector designed using these criteria should perform 
well with all modules. 
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Finally we will present some simple demonstrations of properties of the human 
visual system which are consistent with the edge detector presented here. While 
this does not prove that the human visual system performs the same computations, 
it does suggest that the two systems share a common set of goals. It reinforces the 
choice of performance criteria and gives evidence that an edge detector designed 
using these criteria can perform well on a great variety of images. 

6.1. Step Edges in Noise 

In chapter 7 we will demonstrate that a directional second derivative edge 
operator gives better localization than the Laplacian when applied to a Gaussian 
smoothed image. We have also claimed (chapter 2) that a difference of boxes operator 
gives unacceptable multiple response performance to a noisy step edge. We should 
test those results experimentally now. In figure (6.1) We have a two-dimensional step 
edge with additive white Gaussian noise. The successive frames show the responses 
of difference of boxes, Laplacian of Gaussian and directional first derivative of 
Gaussian operators. The signal to noise ratio of the image, defined as the ratio of 
the amplitude of the step to the standard deviation of the noise at each pixel, is 
about 0.2, and the image is 256 by 256. The Gaussians for the Laplacian and first 
derivative operators both have a a of 8.0 pixels, while the box masks have a length 
of 32 pixels in x and y. 

Processing of the output of the difference of boxes operator after the convolution 
step is identical with the directional first derivative operator, that is, it is 
non-maximum suppressed and thresholded with hysteresis. For the Laplacian of 
Gaussian, the sequence is slightly different. Edges are initially marked at the 
zero-crossings in the convolution output, then the gradients of the convolution 
output are computed and the gradient magnitude is thresholded with hysteresis. 

For each operator there are - two thresholds that are set empirically to give the 
best subjective output. Figure (6.1) gives a guide to the localizing and multiple 
response performance of each of the operators. As we would expect, the directional 
first derivative operator gives subjectively better localization than the Laplacian, 
and the difference of boxes produces several contours in response to the single edge. 
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Figure 6.2. Effect of changing operator thresholds. In the first row, the 
thresholds are increased by 50% , and in the second row they are reduced by 50%. 
The order of operator outputs is (from left to right), (i) Laplacian of Gaussian, (ii) 
difference of boxes and (iii) first derivative of Gaussian. 


89 










To gain some idea of the detection performance (i.e. signal to noise ratio) of 
e operators, we can see how their outputs vary as we change the thresholds from 
e opt,mum values. The first row of figure (6.2) shows the result of increasing 
res o s by 50%, and in the second row the thresholds are reduced from 
from the optimum levels by 50%. From these we can infer that the difference of 
oxes operator has the best signal to noise ratio, since for all threshold levels it 
marks edges only m the vicinity of the step. Of course signal to noise ratio is oily 
one component of detection performance, and lack of multiple responses is the 

1 Ih V M ^ the differenCe ° f b0X6S Perf ° rmS ^ P ° OTly ’ — as 
the thresholds are lowered. The Laplacian of Gaussian exhib.ts poor signal to noise 

ra ,o compared to the other two, and it is not possible to set the thresholds so that 

e ull length of the edge contour is marked without introducing contours due to 
noise. 

It may be argued that the problems with Laplacian of Gaussian or difference 
o oxes operators can be circumvented by applying “pruning” heuristics to their 
outputs. For example, it may. be argued that it is possible to eliminate erroneous 
maxima m the d.fference of boxes output that are “near” the edge, or to use the 
outputs of different Laplacian of Gaussian channels to reinforce the evidence of 
- e ge. This argument misses the point. The optimal operator derived here, or 
e first derivative of Gaussian approximation to it, gives the best performance for 
a single hnear operator when followed by non-maximum suppression. In order to 
improve the performance of the other operators, non-local predicates have to be 
applied which m a sense make the filtering step redundant. 

6.2. Operator Integration 

We have argued that in order to handle a variety of images, an edge detector 
hould incorporate operators of different widths. We have also argued for highly 
mec .onal masks when they are applicable. All of these operators respond to the 
same ypei of feature, a step edge, and where several of them respond to the same 
edge^the detector must mark a single edge only. The problems with the integration 
o , erent operator outputs are very great. In fact many of the arguments against 
irec ,ona operators have been pragmatic, that it is difficult to combine oriented 
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operators and produce a coherent output. The problems with combining operator 
outputs of different widths are even worse, because maxima in the outputs of two 
operators responding to a single edge may be displaced from each other. Chapter 
3 described feature synthesis as a method for combining several feature detector 
outputs. This method is used in the implementation of the edge detector, and we 
now explore how well it performs on some test images. 

6.2.1. Integration of Different Mask Widths 

The reader can readily gain an appreciation of the variety of detail that occurs 
at different scales in an image by reference to figure (6.4). This figure shows the 
edges marked by two operators on an image of a perforated cleaning cloth. The 
mask widths are a = 1.0 and a = 5.0 respectively. The edges at the two scales are 
virtually independent. In contrast, figure (6.7) shows the edges marked by operators 
with o = 1.0 and a = 2.0 on an image of some mechanical parts. In this image, 
almost all of the detail is picked up by the smaller operator, and it only fails on 
some of the shadow boundaries. These two figures capture the essence of the feature 
integration problem. Ideally every feature in the image should be marked, but only 
once. 

Our basic width selection criterion, that is using the smallest operator that has 
sufficient signal to noise ratio, should do the right thing on the parts image. The 
edges at the smaller scale will first be marked, then the larger operator output will 
be synthesized from them, and finally any edges in the large operator output that 
are not consistent with the synthesized output will be added. The result is shown 
in the figure (6.8a). The only difference between this and the figure (6.7a) is the 
addition of some shadow edges, and the extension of some shading edges. Similarly 
the figure (6.5a) shows the combined output from the edges in figure (6.3). Since 
the long shading lines in the image are not seen by the smaller operators, the large 
operator features that correspond to them are not synthesized. When they appear 
in the actual large operator output they are marked in the detector output. 

There is some freedom with the feature synthesis approach as to the “inhibition” 
effect of the synthesized operator outputs. Recall that the actual operator output 
had to be significantly greater than the synthesized output for an edge to be 
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Figure 6.3. (a) Cleaning cloth image 
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Figure 6.4. (a) Edges from cleaning cloth image with operator width a 
(b) edges from operator with a = 5.0 
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Figure 6.6. (a) Parts image 
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Figure 6.7a. Edges from parts image with operator width <x = i 
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Figure 6.7b. Edges from parts image with operator width a = 2.0 
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Figure 6.8b. Superposition of the edges for parts image 








marked. In the present implementation the vector difference between the actual 
and synthesized gradient is first taken and if the magmtude of this difference is 
greater than the magnitude of the synthesized gradient, a new edge is marked. By 
m reducing a scale factor into the comparison, the likelihood of a new edge being 
marked can be varied. This factor must be determined empirically. The above two 
images place confhctmg requirements on the factor. If it is too large, the fuzzy 
e ges in the cloth texture are missed. If it is small, there is duplication of edge 
points in the parts image, and consequent smearing of the edge contours. A single 
va ue was found which gave the results shown in the two figures. This value has 
given good results on all the images tried, and does not require “tuning- for a 
particular .mage. For comparison, below each of the combined edge maps in figures 
(6.5) and (6.8) is the superposition of the edges from which the maps were formed. 

Two final notes. In order for feature synthesis to be effective on the cloth 
exture image, the edges at the smaller scale should be produced by an operator 
which ,s insensitive to slow gradients as described in section (3.5). Otherwise the 
synthesized large operator output will contain a slowly varying component that will 
prevent new edges from being marked, even though the small operator did not mark 
e s ow edges. This component is due to the slow gradient “leaking through” the 
arge number of closely spaced edges from the smaller operator. In general feature 
synthesis is most effective when the two features are independent. 

Also the combined output exhibits streaking of the long contours. This is 
ecause a single scale factor is presently being used for comparison of real and 
synthesized outputs. This factor may be viewed as a kind of threshold, and therefore 
.mproved performance should be possible by using two values, i.e. by thresholding 
with hysteresis. This has yet to be tried at the time of writing. 

6.2.2. Integration of Directional Masks 

Recall from section (3.2) that highly elongated directional operators are 
Purred whenever they have sufficient goodness of fit to the image. The goodness 
Of fit measure is simply the standard deviation of the gradient values at several 
points along the length of the directional mask. A directional operator can only 
mark an edge if the sum of these values exceeds some fixed multiple of their 
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standard deviation. This prevents a directional operator from responding to curved 
edges or from extending edge contours beyond corners. 

It also makes operator integration much easier. For one thing there will seldom 
be more than two directional operators responding to an edge of a particular 
orientation. Also the edges points produced by directional operators will not be 
displaced significantly from the edges produced by less directional operators if both 
have the same width. This is because the only way an edge point can be displaced 
is if the edge is significantly curved, but this will immediately prevent a directional 
operator from responding to it. Feature synthesis is not necessary for directional 
operator integration, and a simple non-maximum scheme suffices. 

The non-maximum suppression scheme was described in section (5.3). The 
only peculiarity of non-maximum suppression for directional operators is that 
the direction of non-maximum suppression is fixed a priori for each mask. It is 
normal to the long axis of the mask. Once all edges have been marked by the 
directional operators, the simple gradient magnitude is computed. A composite 
gradient is formed as the maximum of the simple gradient and the magnitude of 
the directional gradient. This composite gradient is then non-maximum suppressed 
in the gradient direction. The effect of forming a composite gradient is to prevent 
simple (non-directional) edges from being marked adjacent to directional ones. An 
example of the performance of this scheme is given in figure (6.10). The first frame 
of (6.10) shows the edges marked using simple gradient non-maximum suppression. 
The second frame shows the addition of directional operators at the same scale. 
Several additional elongated edges are visible in the second frame. There is no 
evidence of straightening of curved edges or extension of edges beyond corners. 

The edge detector has been used quite extensively by the author in practical 
systems. The first of these is a contour tracker which locates an edge contour in 
an image from a television camera, and plans a trajectory for a robot manipulator 
which follows the contour. The second system forms polygonal approximations to 
the bounding contours of objects in an overhead image of a robot’s workspace. An 
example of this system is given in figure (6.11). These are then used to plan paths 
through the workspace which avoid the obstacles. It has also been used by others 
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Figure 6.9. Dalek image, approximately 700 by 500 pixels 
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Figure 6.10a. Edges from 



image at a — 2.0 
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in the context of shape description (Brady and Asada 1983). It has been used as 
a front end for an implementation of the Marr Poggio stereo algorithm (Grimson 
1981) but a quantitative comparison with Laplacian of Gaussian zero-crossings in 
this application has not yet been done. We close this section with examples of the 
edge detector output on (as promised) a variety of images. These appear in figures 
(6.12) through (6.15). The images are all roughly 700 by 500 pixels and the time 
to process each one was about 10 minutes on an MIT Lisp machine with no special 
hardware. 
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Figure 6.12a. Quasimodo image 
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Figure 6.13b. Edges from Kent image using operator width a 



Figure 6.14a. Westminster image 
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Figure 6.14b. Edges from Westminster 


image, operator width o = 1.0 





























































Figure 6.15a. Marine image 
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6.3. The Line Finder 


An optimal operator for ridges was derived in chapter 4. It was pointed out 
that the extension of this operator to two dimensions was more difficult than the 
edge detector because of the lack of a natural property (such as the gradient) which 
could be used to determine the ridge orientation. The ridge detector must rely on 
a much more noisy quantity (which is approximately the direction of maximum 
curvature) to perform non-maximum suppression. Printed text is a difficult test 
case for a ridge detector because of the variation in orientation and width, and the 
presence of junctions. The ridge detector output on some printed text is given in 
figure (6.16). It uses a second derivative of a Gaussian to approximate the optimal 
operator derived in section (4.1). For reference, the edge detector output on the 
same image is given in figure (6.17). The ridge detector output is subjectively more 
legible, but in many places sections of contour are missing. 

In principle it should be possible to incorporate the ridge detector output in 
the step edge detector using feature synthesis (or some other feature integration 
approach). This has not been done to the present time and remains a challenge to 
low level vision schemes. 
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Some previous formulations have chosen the first or second dcrivaJ 
(ppropriaUi quantity to characterise step edges, and have formed opt, mat J 

f ' hlS dcri '' ?l ' vc 0TCr S0lnc *wpporl. Ivxamplcs of first derivative operate] 
Ipcrators of Roberts (I0GS) and Maclcod (1070). while Modestino and Frj 
Ued 8.0 optimal estimate or the two-dimensional Uplacia.n over a UvJ 
larr and Hildreth (1.980) suggested the Laplacian of * broad CaassiaJ 
ptimucs (he trade-off in localiiaiion send bandwidth. There is a secoj 
lass of formulations in which the image surface is approximated by a J 
Uclloos and the edge parameters arc estimated from the modelled icaaj 
Cxacoplcs of this technique include Hie work, or Prewitt (jQTO), llucckd ( 
fardick (I0S2J These oricLbods allow more direct cstimalcs of edge propc 
U position and orientation, but sioce the basis functions we usually not 
jbe properties apply only to o projection of tbe actual image surface 

Figure 6.16. (a) Text image, (b) negative ridge detector output using a = 0.7 


Figure 6.17. Output of the edge detector on the text image, 





6.4. Psychophysics 

We have made a case for'a particular set of criteria on effective edge detector 
and we have claimed that these criteria are common to a variety of applications, 
n heory any edge detector which is used in these applications should use similar 
criteria, and an algorithm which is consistent with these criteria. The human visual 
system provides structural information about the visual field to later processes, and 

uman beings are adept at stereoscopic depth perception. It is reasonable to argue 
that it should perform edge detection at an early stage. 

It is also reasonable (from the arguments given in chapter 3) to suggest that 
■ ■ ould use a variety of operator widths and orientations. It should therefore give 
pre erence to small operators whenever they have sufficient signal to noise ratio. 

o test this hypothesis we need somehow to produce an image which has different 
detail at two scales, and then add noise to see if the percept changes. Such an 
image is the coarsely sampled picture of Abraham Lincoln used by Harmon and 

73). The effect of the coarse sampling is to introduce irrelevant detail at 
sma scales. The detail makes the image difficult to perceive unless blurred. 

The same effect should be observed if we add noise to the image, because 
he signal to no.se ratio of the small operators will become intolerable before the 
arger ones. Therefore the smaller channels should be ignored at high noise levels, 
while the larger channels will still contain coherent information. A coarsely sampled 
.mage of a well-known stereo-type (not Lincoln) is shown in figure (6 18) The 
successive frames have not been blurred but contain increasing amounts of additive 

* ' Ga “ DOiSe - The kter fr — «■ eas * er to perceive as a human face. We 

erefore have the remarkable situation that adding incoherent noise to such an 
image makes it more perceptible. 

A second result of the analysis in chapter 3 is that highly directional operators 
have better signal to noise ratio than less directional operators. The highly directional 
operators will not be as sensitive to rapid changes in the orientation of an edge 
contour, and will tend to make a rapidly changing contour appear straighten Figure 
(6.19) contains an series of parallel lines which are locally curved but globally 
straight along their length. The lines are closely spaced so that larger channel 
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Figure 6.18. Image of a human face with varying amounts of additive white 
Gaussian noise. 


widths (which would also tend to give a subjective straightening of the contour) 
will have poor signal to noise ratio. When noise is added to this image, there is an 
apparent straightening of the lines. 

This would indicate a scheme which, in contrast with the present detector, 
gives preference to less directional operators when they have sufficient signal 
to noise ratio. However the apparent inconsistency can be resolved if we use a 
more sophisticated applicability test for the directional operators. In the present 
algorithm, directional operators would not be applicable in either of the frames in 
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directii^Md addiUve GaussiarTnoise^^ Paral ' el with sinusoidal variation in 

!? ure 

he simple standard deviation apnlicabilitv mo 
not straight or b u . P y measure ,s poor if the edge contour is 

straight or breaks at a corner. It is also poor if the image is noisy but in th' 

case a direct,onal operator is no less applicable. If image noise is take „ ^ acco 

m 1 6 “ metriC ' ^ W ° U,d «***» of noise to enh “ 

applicability of directional operators, consistent with ( 6 . 19 ). 
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7. Related Work 


Now that we have examined in some detail the design of an edge detector 
using variational techniques, we can contrast it with other schemes with respect 
to both its design goals and the methods used to attain those goals. In order to 
consider any appreciable fraction of the great variety of edge detection schemes that 
have been proposed it is necessary to form a categorization of these schemes. Most 
schemes in fact do not lie wholly within one of these categories, but retain aspects 
of several. We will examine several detectors based on their apparent commitment 
to the following goals 

(i) A decision as to the presence of an edge and an estimate of its location from a 

best-fitting surface that approximates the real image surface. 

(ii) To optimally estimate some derivative, usually first or second, at each point in 

the image and mark edges at local features in these derivative outputs, e.g. 
zero-crossings in second derivative or maxima in first derivative. 

(iii) Frequency domain techniques, which attempt to enhance edges by filtering. 
Here the filters are designed using frequency domain techniques to optimally 
discriminate step edges from the background, by assuming some frequency 
distribution for the background. 

All comparisons will be theoretical and generally quantitative, since for early 
vision level of performance of an algorithm can be crucial. For a more extensive 
survey the reader is referred to Davis (1975). For experimental comparisons the 
reader should see Fram and Deutsch (1975), which compares several operators 
applied to step edges in noise, and includes a comparison with human performance 
on the same synthetic images. Abdou and Pratt (1979) compare local differential 
and template matching operators based on a figure of merit which is very similar 
to the performance criterion used in the present detector. This figure of merit was 
introduced by Pratt (1978, p495) and is given by 

1 7a 1 

F = - V - 

max(J/, I A ) tx 1 + <*d 2 (i) 
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the trade-off between detection and localization. 

7.1. Surface Fitting 

There are a number of edge detectors that are based on some kind of image 
SUr a “ lmg ‘ TheSe meth ° ds USUa "y ™°lve an initial parametrization of the 
:i: 0f SOme Set 0f basis ™°wed by the estimation 

the am pht de a» d position of the best _ fitUng step edg£ ^ ^ 

ne of the earhest examples of this method was the Prewitt operator (1970), which 
used a (jumlratic set of basis functions. Another ear, y example is the detector of 
ueckel (1971). Hueckel's method uses basis functions with circular support, and 
nes o t a single step edge to each circular area. The basis functions are chosen 
so as o give an approximate Fourier Worm of the circular region. However 
as w,th most surface fitting schemes, the basis set is not complete (there are only’ 
asis functions over a support of 52 pixels) and an edge is actually fitted to a 
smoothed version of the original surface. An argument is presented to the effect 
hat the choice of a low-frequency subset of the complete space of basis functions 
does not prejudice the ability of the operator to detect and localize edges, but proof 
of this ,s not given. Instead it is argued that the high-frequency components should 
be ignored because they will contain much of the image noise. 

Another example of this approach is the work of Haralick. In Haralick’s 1980 
article, he proposes a fitting of the image by small planar surfaces or -facet.” 
ges are marked at points which belong to two such facets when the parameters of 
e tw surfaces are inconsistent. The test for consistency is based on the goodness 
f of each surface within its neighbourhood and uses a squared statistic. Again 
e initial surface fitting invqlves a set of parameters which do not completely 
represent the image surface. In this case there are 3 parameters over a square 
support of somewhere between 4 and 25 pixels. The three parameters are in fact 
estimates of the x and y slope and the average value over the support. 

In subsequent work on edge detection using a more general surface fitting 
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technique, Haralick (1982) has used higher order polynomial basis functions with 
larger operator supports. The later scheme locates edges at zero-crossings in the 
second derivative of the modelled image surface in the image gradient direction. It 
uses cubic polynomials (in x and y) as the basis functions over a square support of 
(typically) 121 pixels. Interestingly, this choice of basis functions yields an operator 
which can be shown to be quite similar to the operator described in this report. 
However, if higher order or lower order polynomials are used, performance will be 
worse. We now demonstrate this similarity. 

The polynomials used are the discrete Chebychev polynomials, denoted Pi(r), 
and for simplicity we will consider a one-dimensional problem. The objective of 
surface fitting is to find the coefficients a* such that the sum 

<9(r) = E a > p i( r ) ( 7 -8) 

i=0 

gives the best square-error fit to the actual sampled image surface I(r). That is we 
seek to minimize the value of 

e S = E (/(r) - E aiP,(r)Y (7.9) 

r=-J? V i=0 ' 

We do this by setting to zero the partial derivatives of e 2 with respect to each 
of the ay. 

R fi 

E fiw - E = 0 

t=—R x i=Q ' 

t 

This leads to the solution of a system of linear equations in the ay, but in the 
case where the polynomials are orthogonal the system is diagonal and the solution 
is simply 


where 


a, 


Pi 


R 


E WM 

r=-H 


(7.10) 
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* = E p ) W 

r=-/? 

The method then estimates the first and second partial derivatives of this 
modelled surface. For example the first derivative is 


Substituting equation (7.10) into (7.11) we obtain 


(7.11) 


-<*0) = t i e 

J—0 P] r=—R dr 


( 0 ) 


(7.12) 


he important thing to note about this equation is that it is linear in the 
sampled ,mage intensity, and that therefore the operation of surface fitting followed 
by derivative estimation can be represented as a single convolution. We find the 
equivalent filter for this convolution from (7.12). Since this expression has the form 
of a discrete convolution over r, by removing the summation over r and the input 
erm /(r), we obtain the impulse response of the equivalent filter 




(7.13) 


he derivation of the expression for the second directional derivative is similar 
'he next step in the surface fitting approach is to mark edges at zero-crossings 

“ SC dlreCti ° nal d6riVative - These wi “ -respond to maxima in the first 
derivative given above. This puts us in the position of being able to directly compare 

the surface fitting approach to the variational approach described in this report 

oth methods are effectively marking edges at the maxima in the output of the 

convolution of the image with some linear operator. We can use the optimality 

on -a that we defined for step edges to (analytically) evaluate the performance of 
the surface fitting operator. 
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The form of the equivalent filter depends directly on the choice of basis 
functions P*. The equivalent filter for the Chebychev basis polynomials up to 
degree three is shown in figure (7.1). It turns out that the choice of cubic basis 
polynomials gives the best approximation to the optimal operator derived in this 
report. The perceptive reader may note that the surface fitting and gradient 
estimation procedure is equivalent to convolving with a function that is the best 
approximation to a derivative function (within the constraints imposed by the basis 
functions) i.e. the filter has an impulse response that is the first derivative of a 
delta function. Reference to (7.13) shows that as the order of the basis functions 
becomes large, the filter /(r) tends to a simple local gradient estimator, similar to 
the 3-pixel Prewitt operator. The equivalent filter for n = 7 is shown in the second 
frame of figure (7.1). 

Thus the 3-order Chebychev polynomials give the best performance with this 
approach, while higher order polynomials lead to operators that are approximations 
to local derivative operators. This answers one of the questions raised by Haralick 
in his article as to what order of polynomial functions is best. The answer to the 
other question raised, viz. what form of basis functions to use, can also be answered, 
since his criteria of performance are essentially the same as ours. Note that these 
criteria were used to experimentally evaluate the performance of the surface fitting 
operator, but did not appear explicity in the design. The optimal surface fitting 
operator for step edges would use a single basis function which is the first derivative 
of Gaussian derived here. Fitting and gradient estimation using this single basis 
function is equivalent to convolution with the same function. 

So we see that the ultimate performance of the surface fitting approach is 
determined entirely by the choice of basis functions. However, no analysis was done 
in Haralick (1980) or in Huedkel (1970) as to the optimality of their respective 
sets of functions. Other advocates of the surface fitting approach have made more 
detailed analysis of the basis functions. For example Hummel (1978) suggested the 
use of Karhunen-Loeve principal components for the basis functions. 
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a 



to degree 1 ? (b)' EquivaIent filters for cubic basis functions (a) and for basis functions 


7.2. Derivative Estimation 

Since an ideal step edge is a rapid transition from one intensity value to another, 
eems that a reasonable way to detect edges is to estimate some derivative of the 

Z T".! r? FirSt deriVatiVe ^ b - « by Roberts 

(196 ), Prewitt (1970), Rosenfeld and Thurston (1971), Macleod (1970) and a 

vanety o others. There has also been some interest in operators that estimate 

~ d : Vat,Ve ° f the image intenSity ' The ° Perat ° r ° f Mod -tino and FYies 
( 77) estimates a Gaussian smoothed Laplacian using a computationally efficient 
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recursive filtering algorithm. Herskovits and Binford (1970) used a form of lateral 
inhibition to reduce the sensitivity of their operator to slow gradients, and then 
followed with first and second derivative estimation to locate the edges. Recently 
there has been interest in operators which locate zero-crossings in the second 
directional derivative in the gradient direction, viz. Havens and Strikwerda (1983), 
Torre and Poggio (1983), Yuille (1983) and Haralick (1982). 

We should note that the operator derived in chapters 2 and 3 has very strong 
similarities to two of the above operators. In particular, we have been using the first 
derivative of a Gaussian to approximate the optimal operator derived in chapter 2. 
The simplest two-dimensional extension of this used a Gaussian projection function, 
which results in a two-dimensional operator which is very similar to Macleod’s. 
It also bears a strong resemblance to the Marr-Hildreth operator, at least in one 
dimension, as we shall see in a moment. 

It has been argued in this report that the optimal edge detection function 
should be asymmetric (see section 2.1), and it may therefore be viewed as a first 
derivative operator. However, it was not designed to optimally estimate gradient, 
but to detect step edges. This distinction is subtle, but it should be stressed at this 
point. The argument for derivative estimation is that the image gradient attains a 
maximum at the centre of a step edge, and that therefore edges may be detected 
by finding maxima in gradient. However, it does not follow that gradient is the 
best measure to use to detect and localize edges. Marr and Hildreth (1980) suggest 
the use of the slope of the output of the Laplacian of Gaussian operator. Again the 
observation is that this quantity is proportional to the edge strength. 

We should really be trying to estimate the “edgeness” of a potential edge. 
However such a measure can only be defined implicitly by a variational equation, 
such as equation (2.12). The tendency has been to use a posteriori measures, such 
as gradient or zero-crossings of some derivative, as evidence of edges. The fact 
that high gradients occur near edges do not mean that all points of high gradient 
correspond to edges. 

Since in one dimension the zero-crossing of second derivative operator (Marr 
and Hildreth) is essentially the same (ignoring the thresholding question for the 
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" ‘ m "”’ 0r ‘"“ d, ™ ,m >**“«. -d both employ Gaussian 
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wo dimensions the extensions are different and nerf 
different. We now show that ,h i , Performance is noticeably 

now show that the two d.mensional Laplacian of Gaussian gives 

Poorer localisation than the directional operator described in chapter 3 Let the 

wo- imensional Laplacian of Gaussian be described by the equation 

Ux,y) = ( x2 + y 2 0 \ ( 

j ^ * 2 ~ 2 M~~#-) (7.14) 

the s'" ^ the meth ° d ° f ChaPt6r 3 ’ the St3ndard d6Viation ° f position of 
th -crossing , the quotient of the slope of the sero-crossing at the edge centrl 

0 7 ;, r ° 0t SqUar6d iD the <>P-tor output. Let the input be a step 

amp i u e in the y direction, i.e. the equation of the input S(x, y) is 


S(x,y) = Au-fa) 


Then the slope of the zero-crossing is 


f-oo /-oo AL i x >y)dydx 


d f x o f+OO 

dxQ J —OO J —oo 


and the root mean squared output 


noise is 


(7.15) 


noo 


/ +OO /•-f-oo 

-oo J -oo L 2 ( x >y)dydz 


* 


(7.16) 

Dividing (7.15) by (7J6) and aub.iitudng from (7.14) mi find lh , , 

localization Ap the L,plaei„ ,, “» 


A L = 1 


We 


U,, ””7 ,* lh - «« aligned „ ilh 

' h ' P ° m ‘ ,pre,d operator be d,„itad by action 
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(x 2 \ ( x 2 -\-y 2 \ 

*>*(*•»> = {^- 1 r p { — ) 


(7.17) 


Again we find the quotient of (T.15) and (7.16) and substitute from (7.17) and we 
obtain for the localization of this operator 



So on average we would expect the positional error of the Laplacian of Gaussian 
to be about 60% greater than that of a directional operator of the same a. 

It has been suggested that the strength of a zero-crossing may be estimated 
from the slope of the zero-crossing (normal to the edge direction). We should also 
compare the signal to noise ratio of this measure with signal to noise ratio of the 
first directional derivative. The slope of the zero-crossing of a Laplacian of Gaussian 
is again given by equation *(7.15), while the noise in this value can be found from 

the integral 


noo 


f + OO H-OO f d y 

Loo Loo {^ X ' y) ) dydX 


(7.18) 


This gives the Laplacian of Gaussian a signal to noise ratio El, formed from 
the quotient of equations (7.15) and (7.18), of 


El 


-i/l' 


Finally we compare this value with the signal to noise ratio of the two- 
dimensional directional derivative operator 


D(x, y) = x exp(- 


i 2 + y 2 
2 a 2 


) 


which turns out to have a E value of 
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The above result shows that the slope of a zero-crossing is a very poor estimator 
or edge strength. While there are other possible choices for the edge amplitude 
estimator, we also find that the Laplacian of Gaussian still suffers in localization 
y comparison with a directional operator. The intuitive reason for this is that the 
two-dimensional Laplacian may be decomposed into the sum of 

■n (any) two orthogonal directions. If one of these is chosen to be normal to the edge 
.rectmn, ,t ni clear that this contribution is exactly that of a directional operator 

nothing ^.TT’ “ ^ ^ * * - —- Chutes 

nothing to localization but will increase the amount of noise. 


7.3. Frequency Domain Methods 


n this (rather small) category, we find one particular example of an approach 
.c use criteria very similar to ours, using a frequency domain derivation We 

it Id rtut l, ; h f ° rmUlati0n t0 l6ad t0 “ « -y similar to ours. In fact 

hi l U * S ^ dUe t0 3 rather Unfortunate restriction placed on the 
Possible solution. When the restriction is removed, we again obt 1 a function 

which approximates a first derivative of a Gaussian. 

The „,thod I, that of Shannon, ,, (lm) _ . lb p „ p „ rf ^ ^ 

-o- .»«».,on,l linear operator that .ppm™*, L .pl,ei„ „ , 

Then ,r,ten, o, op.imalit,, „„ , h , ' 

21 TZ convolved „,ft a .ftp 
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capture those of the present design. Maximizing the proportion of total output 
energy in the interval will limit the range over which the maximum in the response 
to a step edge can occur. Band-limiting greatly improves output signal to noise 
ratio, since the spectrum of Gaussian noise is flat while the spectrum of a step edge 
varies as the inverse of frequency, i.e. most of the energy in the edge is concentrated 
at low frequencies. 

Unfortunately, there are two steps in the method of Shanmugam et al which 
the present author finds hard to justify. The first is that they made no attempt 
to mark edge points, but instead thresholded filtered values were output. In fact 
their filter gives two peaks in its response to an ideal step, but these are on either 
side of the centre of the step, and the response at the centre is actually zero. This 
was rectified to some extent in the work of Marr and Hildreth (1980) who used the 
zero-crossings of the same filter, since these features occur at the centre of step 
edges. 

A second problem is that they assumed a priori that the operator they were 
looking for would be trivially extensible to two dimensions by rotating it about 
an axis of symmetry. This immediately restricted them to symmetric operators, 
even in one dimension where the design was done. As we have seen in chapter 3, 
this restriction is unnecessary, and actually degrades performance. In fact if the 
restriction is removed, the same analysis leads to an operator that approximates 
the first derivative of a Gaussian, as used in the current design. We repeat their 
design without the assumption of symmetry now. 

Once again we perform an optimization to find the function that extremizes 
one criterion while another is kept constant. In this case the bandwidth of the 
response H will be fixed while the fraction of total output energy in an interval 
[—//2,-f-//2] is maximized, i.e. if the output response is g(x) and the fraction of 
the energy in the interval is a,‘ we maximize 

/-//2 IffWP dx . . 

a = —r~- (7.1) 

/-OO IsWI dx 

We can make use of the fact that there exists a set of functions ipi(x) the 
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interval [^22 ^ "* band ' Hmited ’ orth °g°™l over the 
I A +1/2] and orthonormal over (—oo, +oo). i e 


C *(*)*>(*) * = I 0, ’ * j 

U, * = j 

dx = 1 °’ 

iXj, t = j 


*>> = 0,1,2,... 


*»J = 0,1,2,... 


(7.2) 


(7.3) 


with X, < 1 for all i, and X 0 > Xi > \ 2 > 

The prolate spheroidal wave functions are complete in the space of band-limited 
unc ions, an hence the output from any band-limited filter can be represented as 


OO 


0( x ) — 52 ani>n{c, x) 
n=0 


(7.4) 


Where the constant 


C is a function of the bandwidth and the size of the interval 


c = 


ni 

2 


Wto th, expulsion f« *„ „ ^ (7 ,„ u , ing 

and (7.3), the value of a becomes V ' 



E“=o K| 2 X„ 


The X, are all positive and X 0 > Xj > X 2 > 


so a is bounded by 



The 


0 < a . < — £-*>=0 l a " l 2 _ > 

E„°L 0 |a n p ~ x ° < i 

upper bound is attained when a n = 0 for n > n m th *• , 

n w ior n > u, so the optimal output i 


9(z) — aoi/>o(c,x) 
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Since this is the desired step response of the filter, we can obtain the impulse 
response f(x) by differentiation. 

f(x) = a 0 ^^o(c,x) (7.7) 

An approximation to the functions ^> n (c, x ) due to Slepian (1965) can be used to 
find a closed form expression for f(x). 

2 

Me, z) = (^) i 2 -J (n!)-iH n (cii)exp(^|-) 

where H n (x) is a Hermite polynomial of degree n. This approximation is useful for 
x < c ~ 1 / 4 and n < c. So V’o(c, x) may be approximated by a Gaussian for small 
x , and the optimal spatial function /(i) will be the first derivative of a Gaussian 
as before 

/(l) = (kx) exp( ^~) 

In their original article, when Shanmugam et al (1979) assumed that the 
function f(x) should be symmetric, they were restricted to the odd prolate spheroidal 
functions ignoring ipo(c, x) which in fact gives the best performance. The X*(c) may 
be used as performance indices since they measure the fraction of the total energy in 
the specified region for the corresponding The values of \o may be significantly 
higher than those of \i for small values of c. The small values of c imply that the 
product of spatial and frequency extent are minimal. For example at c = 0.5 the 
value of Xo is about 0.3 while the value of Xi is 0.0086 (see Slepian 1960). 

The intent of this chapter has been to put the present edge detector in 
context with several other well-known schemes. We have seen that there are strong 
similarities in analytic form with several of these schemes, in particular with the 
detectors of Marr and Hildreth, Macleod, Haralick and Shanmugam et al. There 
are also important differences, for example we have not yet considered the use of 
multiple operators or of highly directional masks. Rosenfeld (1971) used several 
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8. Conclusions and Suggestions for Further Work 


We began this report with a precise definition of a set of goals for edge detection 
and proceeded to derive an operator which best achieved these goals. The goals 
were carefully chosen with minimal assumptions about the form of an optimal edge 
operator. The constraints imposed were that we would mark edges at the maxima 
in the output of a linear shift-invariant operator. By expressing the criteria as 
functionals on the impulse response of the edge detection operator, we were able 
to optimize over a large solution space, without imposing constraint on the form of 
the solution. 

Using this technique with an initial model of a step edge in white Gaussian 
noise, in chapter 2 we found that there was a fundamental limit to the simultaneous 
detection and localization of step edges. This led to a natural uncertainty relationship 
between the localizing and detecting abilities of the edge detector. This relationship 
in turn led to a powerful constraint on the solution, i.e. that there is a class of 
optimal operators all of which can be obtained from a single operator by spatial 
scaling. By varying the width of this operator it is possible to vary the trade-off 
in signal to noise ratio versus localization, at the same time ensuring that for any 
value of one of the quantities, the other will be maximized. 

It was then found that the goals as originally specified were not well defined, 
or rather that the analytic criteria did not articulate all that we expected of the 
edge detector. By adding an explicit criterion related to multiple responses, we were 
able to obtain an operator that met all of our intuitive design goals. The multiple 
response constraint did add considerable complexity to the form of the solution and 
in fact it was not possible to realize a solution in fully closed form. However, the 
analysis was able to constrain the solution to a finite (low) dimensional parameter 
space over which a numerical solution could be obtained. The impulse response of 
the operator is a sum of damped exponential cosines, and it can be. approximated 
by the first derivative of a Gaussian. 

We then extended the above operator to two dimensions and in doing so we 
followed the framework that was established for the one-dimensional formulation 
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extension to two d.mens.ons. It was found that multiple operator 
wdths were necessary to deal with different si g na, to noise rat,os in anTle 
ecause of the trade-off described above. We also found that directional operators 

^ ^ adVMta ^ non-directional operators, and that the more directional 
operator ,s the better its potential performance. The traditional problems 
located with hi g h,y directional operators were dealt with by us,n g a oodnl 
o f fi measure to decide whether each directional operator could be used. We 2 

crerent e w C hT' de o ble ^ ^ deSCri ^ * 

erent whole. Once again we used the same set of design goals to guide the 

or choosing the appropriate operator. Feature synthesis was « 

as a means of combining the outputs of operators whose responses to the same 
feature are not necessarily spatially coincident. 

The first selection heuristic was to give preference to operators of minimum 

Pr ° Vlded they had sufficie “‘ s^nal to noise ratio. This gives maximum 
resolution and localization for a given global signal to noise ratio. The combination 

of III” IT WidthS ^ difBCUlt b6CaUSe ° f ^ ° f Spatial «nce 
. . , necessary to use feature synthesis (examples 

given in chapter 6) for the operator integration. 

Th,• h.uristi, was to favour hi e hl, directional op «„ to „ 

have suBccnt o, „t to th, imw . Th , inUvMon of 
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system, „h„h m other d.mons,,,,,, ,imil»„„ * th , ^ 

ere, uses a different (or more complicated) decision procedure. 

To make the convolution of images with the optimal operator more efficient a 
rs denvative of a Gaussian approximation was used. This allowed us to use imy 
efficient algorithms presented in section 3.2 to spe.ed things up. It was found 
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that there exists an approximately “linear” time (in the width of a square mask) 
algorithm for doing convolutions with arbitrary masks, and so efficient convolution 
is possible even without the approximation. Experiments are now being performed 
to determine the practicality of implementing this scheme in hardware. 

In chapter 4 we were able to generalize the method used for step edges in 
white Gaussian noise to arbitrary features and to non-white but stationary random 
noise. In addition to a general form for the criteria, a fast numerical method for 
the solution was described. This technique was then used to find optimal operators 
for ridge and roof features. The ridge detector was extended to two dimensions, 
and this was found to be much more difficult than for the edge operator because of 
the lack of reliable information about the ridge direction. An example of the ridge 
detector output appears in section (6.3), and was compared to the edge detector on 
the same image. 

Finally in chapters 6 and 7 some comparisons were made between the edge 
detector derived here, the Marr-Hildreth (1980) operator and the difference of boxes 
operator. There were both analytic and experimental comparisons. It was also 
compared to two other edge operators, those of Haralick (1982) and of Shanmugam 
et al. (1979), and was found to be similar to all of these in one dimension. Chapter 6 
also included several examples of the edge detector output on some natural scenes. 
Finally we saw in section 6.5 some perceptual effects which seem to indicate that 
the human visual system uses a similar feature combination scheme. 

There are several directions in which the work in this report could be continued. 
The most obvious is probably the area of integration of feature descriptions. The 
algorithm as described in this report includes a feature synthesis method to combine 
output of several operators of differing width. It could potentially be used to 
combine the outputs of detectors for different features, such as the ridge detector 
described in chapter 4. It may be possible to form criteria on the performance of a 
feature integration scheme. Two possible criteria are 

(i) The integration scheme should not miss features. If a single feature is marked 
by one of the detectors, it should be marked in the integrator output. Also, 
if two feature detectors are responding at the same point in a way that is 
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not consistent with there being only a single feature in the image, then the 
integrator should mark both features on the output. 

(n) The integration scheme shou.d produce an output with minimal redundancy 

A. *«„ „ „ iterta (i) won|d 21 

s mp y mark everythmg seen by any feature detector. In this case no integration 

tw 63 U T, r; rmatl0n 18 ° CCUrring ’ 6 ' g ' 11 18 Unnecessar y to mark a ridge as 
two parallel closely spaced edges if it hoc i , ® 

is T , , . . 6 “ 1 has alread y been identified for what it 

• The eature integrator may be viewed as a filter on the outputs of a.l the 

feature detectors which removes redundant information. 

Both of the above criteria require a scheme which is non-local that is f * 
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used in the T ^ T V n ° n ' i0Cal SCheme as is the surfa <* fitting approach 
TOPOgraph,C Primal Sketch of Haralick (1983). It would be worth 
comparing the two schemes according to the above criteria. The difficulty with 

usmg an opt.mizat.on scheme to find an optimal feature integrator is that K 

more complicated image model would be necessary At the & mUC 

f . , , „ , necessary. At the very least it would need 

include a. those features for which the individual detectors were designed Id 

presumably all possible combinations of those features. 

Another possible extension of the edge detector would be to 3 or more 
imensions. We have already seen in chapter 3 that there is a simple extension of 
he optimal operator to „ dimensions. This operator locates n - 1 surfaces (the 

Usir e h nSI hT al , eXtenSi0n ^ 6dge C0Dt0UrS) WhMe dlSC ° ntinuit - in intensity occur 
sing ,g y directional operators is more difficult in this domain because of the 

arge number of directions needed to uniformly cover an n-sphere. Non-maximum 

suppression is also more complicated for the same reason. 

In particular it is possible to consider the detection of moving edges as a three 
dimensional edge detect inn nr^ki , K 6 three 

Edges in this three d ' 

s in this three dimensional space correspond to edges in the two-dimensional 
age or to points of rapidly changing irradiance. The direction of the edge in 
the three-space can be used to determine the velocity of the two-dimensional edge 
constraint that the time-space edge filter must be causal, that is it must 
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depend only on past and present intensity values. Using a filter that is broad in the 
time domain will introduce a delay into the velocity estimate. Thus the design of 
the time domain filter is a separate optimization problem which requires additional 
constraints of causality and minimal time delay. 

One final generalization of the techniques described in this report would be 
to relax the restriction of linearity on the operator. Shift invariance is clearly a 
desirable property of an edge detection operator, but it is not clear that the optimal 
operator must be linear. In fact the composite operator derived in this report is 
non-linear because it involves a non-local predicate (from the feature synthesis 
scheme) applied to several operator outputs. Ideally this constraint should be either 
relaxed or it should be proven that linear operators can perform as well as non-linear 
operators. The restriction to linear operators here was necessary because of the 
sheer complexity of parametrizing a non-linear shift invariant operator in a form 
which would allow variational methods to be applied. It remains to be seen whether 
this restriction penalizes performance, and whether an unconstrained non-linear 
operator can do any better. 
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Appendix I. Definite Integrals used in the Derivations 

These integrals were referred to in chanfpr 9 k„* 

because ft f th • • , b Were n0t lnc,uded there 

oecause of their excessive length. There are 3 inWr. 1 • , 

3 integrals required to evaluate (2.12) 

written in tl ^ ** (2 ' 24) ' ° f theSe 4 inte ^als, 3 can be 

n the same parametric form, because they a!, involve the integra! of the 

square of a function of the form 


ff(z) cie QI sm ujx + c 2 e“* cos ux + c 3 e~ ax sin ojx + c 4 e~ ax 


coscux 


We now define 


Il(C i’ C2 ’ C3 ’ C4) ~ /-i “d /»(ci,c 2> c 3 ,c 4 ) = 


0 2 (x) dx 


And we find that all of the integrals in the performance criteri 
in terms of I x and I 2 thus 


criteria can be written 


f_ x f{ x )dx — Ii{a lf a 2 ,a 3 ,a 4 ) -f- c 


[~H 2 

'-i f(x)dx = -^ 2 ( a i> °2> a 3, a 4 ) + 4c/j(ar, <j 2 , a 3 , a 4 ) -f- 2c 2 

/“H f2 

/' (.) dx = / 2 (a ai - waj, aa 2 + WBl , _ aa3 _ ^ _ Qfl4 + ^ 
[ + 1 ff 2 

f (X) dX = H0ai ~ *•* + 7«i. /5a 3 + 7 a 4) /?a 4 - 7 a 3 ) 

where 
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P — a 2 — u 2 


and 


7 = 2au) 


The closed form for the definite integral I\ in terms of a, u, and c\ through 
C4 and for a unit interval (W — 1) is 


A(ci,C2,c 3 j C 4) = —rfcie a (asin u — wcosw) + o;ci 

Or -j~ u)* \ 

-j-C2e a (o; sin w -f a cos a;) — ac2 
-f-c 3 e — a (asino; + a; cos a;)— a;c 3 
-|-C4e —a (u; sin a; — a cos u) -f- ac^j 

Similarly the closed form for I 2 over the unit interval is 


^2 (ci,C 2 ,c 3 ,C4) = —g—p—^ (^c5e 2at (—aw 2 sin 2a; — a 2 u;cos2a; + u 3 + <* 2 w) — c 2 u 3 

-\-c\ e 2a (ao; 2 sin 2a; -f cos 2a; -f- a; 3 + a 2 u) 

—c^a; 3 + 2a 2 u) 

-f-C3e~ 2a (— au 2 sin 2a; -f* a: 2 w cos 2a; — a; 3 — a 2 a;) + c^a; 3 
-J-c 2 e —2a (o:a; 2 sin 2a; — a 2 a; cos 2a; — a; 3 — a 2 a;) 

-f c 2 (a; 3 + 2a 2 u) 

-f-2ciC2e 2a (o: 2 a; sin 2a; — aa; 2 cos 2a;) + 2ciC2«a; 2 
-j-2cic 3 (a 3 + au 2 )(2u — sin 2a;) 

—4 (ciC 4 -J~ C2C 3 )(a 3 -f- au 2 ) sin 2 w 
-|-2 c 2C4 (q : 3 + au 2 )(2u -f sin 2a;) 

+2c 3 C4e” 2a (—a 2 a; sin 2a; — au 2 cos 2a;) -f* 2c 3 C4aa; 2 l 
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